[Pywikipedia-l] [ pywikipediabot-Bugs-1852173 ] utf-8 coding problem, kills weblinkchecker

SourceForge.net noreply at sourceforge.net
Mon Dec 17 08:57:00 UTC 2007


Bugs item #1852173, was opened at 2007-12-17 08:57
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1852173&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: utf-8 coding problem, kills weblinkchecker

Initial Comment:
Weblinkchecker chokes on many instances when reading the Special:Allpages of the ksh Wikipedia. It claims to see non-unicode data, which is unlikely to be so. I did not dig the code in detail atm, and I cannot tell the offending byte sequences atm.

Here is a copy of a linux command line, and the output generated by it:

$~> python weblinkchecker.py -putthrottle:300 -start:00er -v -family:wikipedia -lang:ksh
Checked for running processes. 1 processes currently running, including the current process.
Pywikipediabot  (r4720 (wikipedia.py), Dec 15 2007, 18:57:27)
Python 2.4.4 (#2, Aug 16 2007, 00:34:54) 
[GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)]
Retrieving Allpages special page for wikipedia:ksh from 00er, namespace 0
Retrieving Allpages special page for wikipedia:ksh from 00er%20Joare%20%28Watt%20%C4%97%C3%9F%C3%9F%20datt%3F%29%21, namespace 0
Retrieving Allpages special page for wikipedia:ksh from 00er%2520Joare%2520%2528Watt%2520%25C4%2597%25C3%259F%25C3%259F%2520datt%253F%2529%2521, namespace 0
DBG> BUG: Non-unicode passed to wikipedia.output without decoder!
  File "threading.py", line 442, in __bootstrap
    self.run()
  File "/home/purodha/pywikipedia/pagegenerators.py", line 632, in run
    wikipedia.output(str(e))
  File "/home/purodha/pywikipedia/wikipedia.py", line 5351, in output
    print traceback.print_stack()
None
DBG> Attempting to recover, but please report this problem
Couldn't extract allpages special page. Make sure you're using MonoBook skin.
Saving history...

I had made sure, the user [[:ksh:User:Weblinkchcker]] was logged in, using the monobook skin, and the English interface language.

I could not make sure that weblinkchecker does use his user account while reading only. A test revealed that there is no apparent difference in behaviour when I rename login-data/wikipedia-ksh-Weblinkchecker-login.data to something else.

If there are questions, I am prepared to provide more info, once I know where to look.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1852173&group_id=93107



More information about the Pywikipedia-l mailing list