[Pywikipedia-l] [ pywikipediabot-Bugs-1893001 ] Sporadic UnicodeDecodeError for German umlauts (utf8)

SourceForge.net noreply at sourceforge.net
Fri May 16 04:38:59 UTC 2008


Bugs item #1893001, was opened at 2008-02-13 10:53
Message generated for change (Comment added) made by nobody
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1893001&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Sporadic UnicodeDecodeError for German umlauts (utf8)

Initial Comment:
The very annoying error message below sometimes stops my bot from working. Test it on de.wiktionary.org with the following articles containing German umlauts (one will certainly give this error; from skipfile):

[[Aussprachewörterbuch]]
[[Fachwörterbuch]]
[[Fräswerkzeug]]
[[Gelände]]
[[Geländer]]
[[Geländewagen]]
[[Gelübde]]
[[Herkunftswörterbuch]]

Error message:

Traceback (most recent call last):
  File "interwiki.py", line 1572, in ?
    bot.run()
  File "interwiki.py", line 1347, in run
    self.queryStep()
  File "interwiki.py", line 1321, in queryStep
    self.oneQuery()
  File "interwiki.py", line 1317, in oneQuery
    subject.workDone(self)
  File "interwiki.py", line 659, in workDone
    (skip, alternativePage) = self.disambigMismatch(page)
  File "interwiki.py", line 522, in disambigMismatch
    if self.originPage.isDisambig() and not page.isDisambig():
  File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 983, in isDisambig
    for tn in self.templates():
  File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1561, in templates
    return [template for (template, param) in self.templatesWithParams()]
  File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1609, in templatesWithParams
    name = Page(self.site(), name).title()
  File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 317, in __init__
    t = url2unicode(t, site = insite, site2 = site)
  File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 3415, in url2unicode
    raise firstException
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13: invalid data

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-05-15 21:38

Message:
Logged In: NO 

Problem still there (r5380) at wikt:de:
[[Inkommensurabilität]]

----------------------------------------------------------------------

Comment By: Rotem Liss (rotemliss)
Date: 2008-02-13 12:48

Message:
Logged In: YES 
user_id=1327030
Originator: NO

One template uses a URL-encoded char which seems to be encoded in
ISO-8859-1 rather than UTF-8. This seems to crash the bot. I'm not sure
what's the best way to treat it; possibly adding ISO-8859-1 as another
encoding for wiktionary:de? (wikipedia:de already has it, as a "historical
encoding", and successfully parses the template)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1893001&group_id=93107



More information about the Pywikipedia-l mailing list