Bugs item #3610770, was opened at 2013-04-13 12:55
Message generated for change (Tracker Item Submitted) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=361077…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Merlijn S. van Deen (valhallasw)
Assigned to: Nobody/Anonymous (nobody)
Summary: weblinkchecker URL unicode problems
Initial Comment:
As reported by Anima in
https://sourceforge.net/tracker/?func=detail&aid=3602096&group_id=9…
Weblinkchecker jumps through some strange unicode hoops. There is no such thing as a
unicode URL - URLs are /always/ urlencoded UTF-8 strings, so:
>>
urllib.quote(u"ö".encode('utf-8'))
'%C3%B6'
anything else is *wrong*, including things like asking what encoding the web server uses:
that is only relevant for decoding the page *text*.
Basic test case:
Contacting server
svoya-igra.org to find out its default encoding...
Error retrieving server's default charset. Using ISO 8859-1.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "weblinkchecker.py", line 218, in __init__
self.changeUrl(url)
File "weblinkchecker.py", line 275, in changeUrl
self.path = unicode(urllib.quote(self.path.encode(encoding)))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-6:
ordinal not in range(256)
valhallasw@lisilwen:~/src/pywikipedia/trunk/pywikipedia$ python version.py
Pywikipedia [svn+ssh] valhallasw@trunk/pywikipedia (r11368, 2013/04/13, 08:16:45, ok)
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=361077…