Support Requests item #1871836, was opened at 2008-01-15 12:27 Message generated for change (Comment added) made by rotemliss You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603139&aid=1871836...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: André Malafaya Baptista (malafaya) Assigned to: Nobody/Anonymous (nobody) Summary: Encoding error while processing [[en:Chişinău]]
Initial Comment: I'm almost sure this is not a bug so I'm putting it here in Support Requests. When I try to process the page [[en:Chişinău]] (with bot account on 'en'), I get the following result:
X:>interwiki.py -lang:en Chi%C5%9Fin%C4%83u
Checked for running processes. 2 processes currently running, including the curr ent process. Getting 1 pages from wikipedia:en... [[Chisinau]]: [[en:Chisinau]] gives new interwiki [[lt:Kisiniovas]] [[Chisinau]]: [[en:Chisinau]] gives new interwiki [[lv:Kisineva]]
(...output ommited deliberately...)
======Post-processing [[en:Chisinau]]====== Updating links on page [[en:Chisinau]]. Exception in Page constructor Dump en (wikipedia) saved Traceback (most recent call last): File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1606, in <module> bot.run() File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1381, in run self.queryStep() File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1360, in queryStep subj.finish(self) File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 967, in finish if self.replaceLinks(page, new, bot): File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1010, in replaceLinks ignorepage = wikipedia.Page(page.site(), iw.groups()[0]) File "D:\Work\pywikipediabot-HEAD\pywikipedia\wikipedia.py", line 425, in __init__ % (site, title, insite, defaultNamespace) File "D:\Program Files\Python\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode characters in position 28-34: character maps to <undefined> ================
I belive it's an invalid UTF-8 byte sequence somewhere in the page but I'd like someone more experienced to verify this. Thanks.
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2008-01-16 14:09
Message: Logged In: YES user_id=1327030 Originator: NO
r4897 and r4898 should both fix that. Do them?
----------------------------------------------------------------------
Comment By: André Malafaya Baptista (malafaya) Date: 2008-01-15 20:21
Message: Logged In: YES user_id=1037345 Originator: YES
I tried r4893 (latest) and everything went exactly the same way. Call stack and output exactly the same:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 28-34: character maps to <undefined>
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss) Date: 2008-01-15 18:21
Message: Logged In: YES user_id=1327030 Originator: NO
This may be fixed in r4893 (I tried to fix a possible unicode problem in output, and a possible KeyError for an obsolete site). If it doesn't, does it change the output?
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603139&aid=1871836...
pywikipedia-l@lists.wikimedia.org