[Pywikipedia-l] [ pywikipediabot-Bugs-1852173 ] utf-8 coding problem, kills weblinkchecker
SourceForge.net
noreply at sourceforge.net
Tue Dec 18 18:13:28 UTC 2007
Bugs item #1852173, was opened at 2007-12-17 10:57
Message generated for change (Comment added) made by rotemliss
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1852173&group_id=93107
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: utf-8 coding problem, kills weblinkchecker
Initial Comment:
Weblinkchecker chokes on many instances when reading the Special:Allpages of the ksh Wikipedia. It claims to see non-unicode data, which is unlikely to be so. I did not dig the code in detail atm, and I cannot tell the offending byte sequences atm.
Here is a copy of a linux command line, and the output generated by it:
$~> python weblinkchecker.py -putthrottle:300 -start:00er -v -family:wikipedia -lang:ksh
Checked for running processes. 1 processes currently running, including the current process.
Pywikipediabot (r4720 (wikipedia.py), Dec 15 2007, 18:57:27)
Python 2.4.4 (#2, Aug 16 2007, 00:34:54)
[GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)]
Retrieving Allpages special page for wikipedia:ksh from 00er, namespace 0
Retrieving Allpages special page for wikipedia:ksh from 00er%20Joare%20%28Watt%20%C4%97%C3%9F%C3%9F%20datt%3F%29%21, namespace 0
Retrieving Allpages special page for wikipedia:ksh from 00er%2520Joare%2520%2528Watt%2520%25C4%2597%25C3%259F%25C3%259F%2520datt%253F%2529%2521, namespace 0
DBG> BUG: Non-unicode passed to wikipedia.output without decoder!
File "threading.py", line 442, in __bootstrap
self.run()
File "/home/purodha/pywikipedia/pagegenerators.py", line 632, in run
wikipedia.output(str(e))
File "/home/purodha/pywikipedia/wikipedia.py", line 5351, in output
print traceback.print_stack()
None
DBG> Attempting to recover, but please report this problem
Couldn't extract allpages special page. Make sure you're using MonoBook skin.
Saving history...
I had made sure, the user [[:ksh:User:Weblinkchcker]] was logged in, using the monobook skin, and the English interface language.
I could not make sure that weblinkchecker does use his user account while reading only. A test revealed that there is no apparent difference in behaviour when I rename login-data/wikipedia-ksh-Weblinkchecker-login.data to something else.
If there are questions, I am prepared to provide more info, once I know where to look.
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2007-12-18 20:13
Message:
Logged In: YES
user_id=1327030
Originator: NO
The problem was that you found a very special case in Allpages, in which
you didn't get one single page of the kind you requested
(redirect/non-redirect), so you didn't get the next page but double-encoded
the current one. Fixed this in r4734. Note that the UTF-8 problem isn't
related to this bug (the "Couldn't extract allpages special page. Make sure
you're using MonoBook skin." message is related, and the problem was it was
a string, not a unicode, passed to wikipedia.output), but was fixed in
r4733.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1852173&group_id=93107
More information about the Pywikipedia-l
mailing list