The problem seems to be in the API result:
http://techbase.kde.org/api.php?action=query&list=allpages&apfrom=Lo...
This shows two results: the correct Localization/fy/Fryske kompjûterwurden and the incorrect Localization/fy/Fryske kompj�terwurde, with *different* page ids. More specifically:
http://techbase.kde.org/api.php?action=query&prop=info&pageids=9156%...
One is Fryske kompjûterwurde encoded as latin-1. This cannot be decoded as utf-8, and thus results in a � character. You can see this from the URL:
http://techbase.kde.org/Localization/fy/Fryske_kompj%FBterwurden <-- %FB = û in latin-1,
while
http://techbase.kde.org/Localization/fy/Fryske_kompj%C3%BBterwurden <-- %C3%BB = û in utf-8.
I think this qualifies as API bug, as the rest of mediawiki seems to be able to cope with the incorrect encoding. I'll try to get one of the API developers to take a look.
---
** [bugs:#1658] “Title contains illegal char (\uFFFD)” with existing page**
**Status:** open-accepted **Labels:** character encoding **Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández **Last Updated:** Sun Aug 25, 2013 03:24 PM UTC **Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompj%C3%BBterwurden
Traceback (most recent call last): File "maintenance.py", line 81, in <module> main() File "maintenance.py", line 77, in main bot.run() File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run for page in self.generator: File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator for page in generator: File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator for page in generator: File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__ yield self.result(item) File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns']) File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__ self._link = Link(title, source=source, defaultNamespace=ns) File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__ raise pywikibot.Error("Title contains illegal char (\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs@lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
pywikipedia-bugs@lists.wikimedia.org