Bugs item #1893001, was opened at 2008-02-13 19:53
Message generated for change (Settings changed) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=189300…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Pending
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Sporadic UnicodeDecodeError for German umlauts (utf8)
Initial Comment:
The very annoying error message below sometimes stops my bot from working. Test it on
de.wiktionary.org with the following articles containing German umlauts (one will
certainly give this error; from skipfile):
[[Aussprachewörterbuch]]
[[Fachwörterbuch]]
[[Fräswerkzeug]]
[[Gelände]]
[[Geländer]]
[[Geländewagen]]
[[Gelübde]]
[[Herkunftswörterbuch]]
Error message:
Traceback (most recent call last):
File "interwiki.py", line 1572, in ?
bot.run()
File "interwiki.py", line 1347, in run
self.queryStep()
File "interwiki.py", line 1321, in queryStep
self.oneQuery()
File "interwiki.py", line 1317, in oneQuery
subject.workDone(self)
File "interwiki.py", line 659, in workDone
(skip, alternativePage) = self.disambigMismatch(page)
File "interwiki.py", line 522, in disambigMismatch
if self.originPage.isDisambig() and not page.isDisambig():
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 983, in
isDisambig
for tn in self.templates():
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1561, in
templates
return [template for (template, param) in self.templatesWithParams()]
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 1609, in
templatesWithParams
name = Page(self.site(), name).title()
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 317, in __init__
t = url2unicode(t, site = insite, site2 = site)
File "/home/x/PyWikipediaBot-2008-01-24/wikipedia.py", line 3415, in
url2unicode
raise firstException
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13: invalid
data
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-11-06 14:05
Message:
Due to the time has left after your last remark: Is this bug still valid?
I would close this request after a pending time of two weeks.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2008-05-16 06:38
Message:
Logged In: NO
Problem still there (r5380) at wikt:de:
[[Inkommensurabilität]]
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2008-02-13 21:48
Message:
Logged In: YES
user_id=1327030
Originator: NO
One template uses a URL-encoded char which seems to be encoded in
ISO-8859-1 rather than UTF-8. This seems to crash the bot. I'm not sure
what's the best way to treat it; possibly adding ISO-8859-1 as another
encoding for wiktionary:de? (wikipedia:de already has it, as a "historical
encoding", and successfully parses the template)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=189300…