Hi. Do PWB has issues with decoding URL strings?
Try this script: from __future__ import absolute_import, unicode_literals
import re, urllib import pywikibot
mylist = \ [ u"Åge Hovengen", u"Åge Konradsen", u"Åge Ramberg", ] for a in mylist: ssite = pywikibot.getSite("en") spage = pywikibot.Page(ssite, a) text = spage.get() m0 = re.search(ur"{{\s*Stortingetbio\s*|\s*(?:id=)?\s*([^\s}|]+)\s*[|}]", text, flags=re.IGNORECASE) if m0: m = m0.group(1) test1 = urllib.unquote(m) test2 = urllib.unquote_plus(m) test3 = m.decode('utf8') test4 = m.encode('utf8') pywikibot.output(test1) pywikibot.output(test2) pywikibot.output(test3) pywikibot.output(test4)
It doesn't decode for me %c3%85 to ÅWhile on http://repl.it/Izdw/2 you can see that pure python can decode that string sequence with urllib.unquote and urllib.unquote_plus.Is this a PWB bug or what?