Decoding strings issue in PWB - pywikibot

23 Jun 2017

      Hi.
Do PWB has issues with decoding URL strings?
Try this script:
from __future__ import absolute_import, unicode_literals
import re, urllib
import pywikibot
mylist = \
    [
                u"Åge Hovengen",
                u"Åge Konradsen",
                u"Åge Ramberg",
    ]

for a in mylist:
    ssite = pywikibot.getSite("en")
    spage = pywikibot.Page(ssite, a)
    text = spage.get()
    m0 = re.search(ur"{{\s*Stortingetbio\s*|\s*(?:id=)?\s*([^\s}|]+)\s*[|}]", text, flags=re.IGNORECASE)
    if m0:
        m = m0.group(1)
        test1 = urllib.unquote(m)
        test2 = urllib.unquote_plus(m)
        test3 = m.decode('utf8')
        test4 = m.encode('utf8')
        pywikibot.output(test1)
        pywikibot.output(test2)
        pywikibot.output(test3)
        pywikibot.output(test4)
It doesn't decode for me %c3%85 to ÅWhile on http://repl.it/Izdw/2 you can see that pure python can decode that string sequence with urllib.unquote and urllib.unquote_plus.Is this a PWB bug or what?