Hi.
Do PWB has issues with decoding URL strings?
Try this script:
from __future__ import absolute_import, unicode_literals
import re, urllib
import pywikibot
mylist = \
[
u"Åge Hovengen",
u"Åge Konradsen",
u"Åge Ramberg",
]
for a in mylist:
ssite = pywikibot.getSite("en")
spage = pywikibot.Page(ssite, a)
text = spage.get()
m0 =
re.search(ur"\{\{\s*Stortingetbio\s*\|\s*(?:id=)?\s*([^\s}\|]+)\s*[\|\}]", text,
flags=re.IGNORECASE)
if m0:
m = m0.group(1)
test1 = urllib.unquote(m)
test2 = urllib.unquote_plus(m)
test3 = m.decode('utf8')
test4 = m.encode('utf8')
pywikibot.output(test1)
pywikibot.output(test2)
pywikibot.output(test3)
pywikibot.output(test4)
It doesn't decode for me %c3%85 to ÅWhile on
http://repl.it/Izdw/2 you can see that
pure python can decode that string sequence with urllib.unquote and urllib.unquote_plus.Is
this a PWB bug or what?