https://bugzilla.wikimedia.org/show_bug.cgi?id=55145
Web browser: ---
Bug ID: 55145
Summary: weblinkchecker URL unicode problems
Product: Pywikibot
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General
Assignee: Pywikipedia-bugs(a)lists.wikimedia.org
Reporter: legoktm.wikipedia(a)gmail.com
Classification: Unclassified
Mobile Platform: ---
Originally from:
http://sourceforge.net/p/pywikipediabot/bugs/1613/
Reported by: valhallasw
Created on: 2013-04-13 19:55:05
Subject: weblinkchecker URL unicode problems
Original description:
As reported by Anima in
https://sourceforge.net/tracker/?func=detail&aid=3602096&group\_id=…
Weblinkchecker jumps through some strange unicode hoops. There is no such thing
as a unicode URL - URLs are /always/ urlencoded UTF-8 strings, so:
>>> urllib.quote\(u"ö".encode\('utf-8'\)\)
'%C3%B6'
anything else is \*wrong\*, including things like asking what encoding the web
server uses: that is only relevant for decoding the page \*text\*.
Basic test case:
>>> import weblinkchecker
>>> lc =
weblinkchecker.LinkChecker\(u"http://svoya-igra.org/Райков
Александр Вадимович/"\)
Contacting server
svoya-igra.org to find out its default encoding...
Error retrieving server's default charset. Using ISO 8859-1.
Traceback \(most recent call last\):
File "<stdin>", line 1, in <module>
File "weblinkchecker.py", line 218, in \_\_init\_\_
self.changeUrl\(url\)
File "weblinkchecker.py", line 275, in changeUrl
self.path = unicode\(urllib.quote\(self.path.encode\(encoding\)\)\)
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-6:
ordinal not in range\(256\)
valhallasw@lisilwen:~/src/pywikipedia/trunk/pywikipedia$ python version.py
Pywikipedia \[svn+ssh\] valhallasw@trunk/pywikipedia \(r11368, 2013/04/13,
08:16:45, ok\)
Python 2.7.3 \(default, Aug 1 2012, 05:14:39\)
\[GCC 4.6.3\]
config-settings:
use\_api = True
use\_api\_login = True
unicode test: ok
--
You are receiving this mail because:
You are the assignee for the bug.