Bugs item #1893001, was opened at 2008-02-13 10:53 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1893001...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Sporadic UnicodeDecodeError for German umlauts (utf8)
Initial Comment: The very annoying error message below sometimes stops my bot from working. Test it on de.wiktionary.org with the following articles containing German umlauts (one will certainly give this error; from skipfile):
[[Aussprachewörterbuch]] [[Fachwörterbuch]] [[Fräswerkzeug]] [[Gelände]] [[Geländer]] [[Geländewagen]] [[Gelübde]] [[Herkunftswörterbuch]]
Error message:
Traceback (most recent call last): File "interwiki.py", line 1572, in ? bot.run() File "interwiki.py", line 1347, in run self.queryStep() File "interwiki.py", line 1321, in queryStep self.oneQuery() File "interwiki.py", line 1317, in oneQuery subject.workDone(self) File "interwiki.py", line 659, in workDone (skip, alternativePage) = self.disambigMismatch(page) File "interwiki.py", line 522, in disambigMismatch if self.originPage.isDisambig() and not page.isDisambig(): File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 983, in isDisambig for tn in self.templates(): File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1561, in templates return [template for (template, param) in self.templatesWithParams()] File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1609, in templatesWithParams name = Page(self.site(), name).title() File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 317, in __init__ t = url2unicode(t, site = insite, site2 = site) File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 3415, in url2unicode raise firstException UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13: invalid data
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody) Date: 2008-05-15 21:38
Message: Logged In: NO
Problem still there (r5380) at wikt:de: [[Inkommensurabilität]]
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss) Date: 2008-02-13 12:48
Message: Logged In: YES user_id=1327030 Originator: NO
One template uses a URL-encoded char which seems to be encoded in ISO-8859-1 rather than UTF-8. This seems to crash the bot. I'm not sure what's the best way to treat it; possibly adding ISO-8859-1 as another encoding for wiktionary:de? (wikipedia:de already has it, as a "historical encoding", and successfully parses the template)
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1893001...