[Pywikipedia-l] [ pywikipediabot-Bugs-1893001 ] Sporadic UnicodeDecodeError for German umlauts (utf8)
SourceForge.net
noreply at sourceforge.net
Fri May 16 04:38:59 UTC 2008
Bugs item #1893001, was opened at 2008-02-13 10:53
Message generated for change (Comment added) made by nobody
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1893001&group_id=93107
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Sporadic UnicodeDecodeError for German umlauts (utf8)
Initial Comment:
The very annoying error message below sometimes stops my bot from working. Test it on de.wiktionary.org with the following articles containing German umlauts (one will certainly give this error; from skipfile):
[[Aussprachewörterbuch]]
[[Fachwörterbuch]]
[[Fräswerkzeug]]
[[Gelände]]
[[Geländer]]
[[Geländewagen]]
[[Gelübde]]
[[Herkunftswörterbuch]]
Error message:
Traceback (most recent call last):
File "interwiki.py", line 1572, in ?
bot.run()
File "interwiki.py", line 1347, in run
self.queryStep()
File "interwiki.py", line 1321, in queryStep
self.oneQuery()
File "interwiki.py", line 1317, in oneQuery
subject.workDone(self)
File "interwiki.py", line 659, in workDone
(skip, alternativePage) = self.disambigMismatch(page)
File "interwiki.py", line 522, in disambigMismatch
if self.originPage.isDisambig() and not page.isDisambig():
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 983, in isDisambig
for tn in self.templates():
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1561, in templates
return [template for (template, param) in self.templatesWithParams()]
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1609, in templatesWithParams
name = Page(self.site(), name).title()
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 317, in __init__
t = url2unicode(t, site = insite, site2 = site)
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 3415, in url2unicode
raise firstException
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13: invalid data
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2008-05-15 21:38
Message:
Logged In: NO
Problem still there (r5380) at wikt:de:
[[Inkommensurabilität]]
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2008-02-13 12:48
Message:
Logged In: YES
user_id=1327030
Originator: NO
One template uses a URL-encoded char which seems to be encoded in
ISO-8859-1 rather than UTF-8. This seems to crash the bot. I'm not sure
what's the best way to treat it; possibly adding ISO-8859-1 as another
encoding for wiktionary:de? (wikipedia:de already has it, as a "historical
encoding", and successfully parses the template)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1893001&group_id=93107
More information about the Pywikipedia-l
mailing list