Bugs item #3610770, was opened at 2013-04-13 12:55 Message generated for change (Tracker Item Submitted) made by valhallasw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3610770...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Merlijn S. van Deen (valhallasw) Assigned to: Nobody/Anonymous (nobody) Summary: weblinkchecker URL unicode problems
Initial Comment: As reported by Anima in https://sourceforge.net/tracker/?func=detail&aid=3602096&group_id=93...
Weblinkchecker jumps through some strange unicode hoops. There is no such thing as a unicode URL - URLs are /always/ urlencoded UTF-8 strings, so:
urllib.quote(u"ö".encode('utf-8'))
'%C3%B6'
anything else is *wrong*, including things like asking what encoding the web server uses: that is only relevant for decoding the page *text*.
Basic test case:
import weblinkchecker lc = weblinkchecker.LinkChecker(u"http://svoya-igra.org/%D0%A0%D0%B0%D0%B9%D0%BA%D0%BE%D0%B2 Александр Вадимович/")
Contacting server svoya-igra.org to find out its default encoding... Error retrieving server's default charset. Using ISO 8859-1. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "weblinkchecker.py", line 218, in __init__ self.changeUrl(url) File "weblinkchecker.py", line 275, in changeUrl self.path = unicode(urllib.quote(self.path.encode(encoding))) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-6: ordinal not in range(256)
valhallasw@lisilwen:~/src/pywikipedia/trunk/pywikipedia$ python version.py Pywikipedia [svn+ssh] valhallasw@trunk/pywikipedia (r11368, 2013/04/13, 08:16:45, ok) Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] config-settings: use_api = True use_api_login = True unicode test: ok
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3610770...
pywikipedia-bugs@lists.wikimedia.org