The problem seems to be in the API result:
http://techbase.kde.org/api.php?action=query&list=allpages&apfrom=L…
This shows two results: the correct Localization/fy/Fryske kompjûterwurden and the
incorrect Localization/fy/Fryske kompj�terwurde, with *different* page ids. More
specifically:
http://techbase.kde.org/api.php?action=query&prop=info&pageids=9156…
One is Fryske kompjûterwurde encoded as latin-1. This cannot be decoded as utf-8, and thus
results in a � character. You can see this from the URL:
http://techbase.kde.org/Localization/fy/Fryske_kompj%FBterwurden <-- %FB = û in
latin-1,
while
http://techbase.kde.org/Localization/fy/Fryske_kompj%C3%BBterwurden <-- %C3%BB = û in
utf-8.
I think this qualifies as API bug, as the rest of mediawiki seems to be able to cope with
the incorrect encoding. I'll try to get one of the API developers to take a look.
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open-accepted
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 03:24 PM UTC
**Owner:** nobody
This is happening with the following existing page:
http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in
PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in
DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in
__iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in
result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in
wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in
wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in
__init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from
sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to
https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing
list, you can unsubscribe from the mailing list.