The problem seems to be in the API result:

http://techbase.kde.org/api.php?action=query&list=allpages&apfrom=Localization/fy/Fryske_kompj

This shows two results: the correct Localization/fy/Fryske kompjûterwurden and the incorrect Localization/fy/Fryske kompj�terwurde, with different page ids. More specifically:

http://techbase.kde.org/api.php?action=query&prop=info&pageids=9156|6713&inprop=url

One is Fryske kompjûterwurde encoded as latin-1. This cannot be decoded as utf-8, and thus results in a � character. You can see this from the URL:

http://techbase.kde.org/Localization/fy/Fryske_kompj%FBterwurden <-- %FB = û in latin-1,

while

http://techbase.kde.org/Localization/fy/Fryske_kompj%C3%BBterwurden <-- %C3%BB = û in utf-8.

I think this qualifies as API bug, as the rest of mediawiki seems to be able to cope with the incorrect encoding. I'll try to get one of the API developers to take a look.


[bugs:#1658] “Title contains illegal char (\uFFFD)” with existing page

Status: open-accepted
Labels: character encoding
Created: Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
Last Updated: Sun Aug 25, 2013 03:24 PM UTC
Owner: nobody

This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden

Traceback (most recent call last):
File "maintenance.py", line 81, in
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in iter
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/init.py", line 249, in wrapper
return method(args, **kw)
File "/home/gallaecio/fontes/rodela/pywikibot/init.py", line 249, in wrapper
return method(
args, **kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in init
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in init
raise pywikibot.Error("Title contains illegal char (\uFFFD)")


Sent from sourceforge.net because Pywikipedia-bugs@lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.