- **status**: open --> closed-fixed
---
** [bugs:#1659] harvest_template adds incorrect source**
**Status:** closed-fixed
**Created:** Mon Aug 26, 2013 10:37 AM UTC by Merlijn S. van Deen
**Last Updated:** Mon Aug 26, 2013 11:01 AM UTC
**Owner:** nobody
See http://ultimategerardm.blogspot.nl/2013/08/reviving-my-bot-for-wikidata-ii.…
to reproduce:
~~~~~
>>> from scripts import harvest_template
>>> R = harvest_template.HarvestRobot('x', 'y', 'z')
>>> R.setSource('en')
>>> R.source
Claim(Property:P143)
>>> R.source.target
ItemPage(Q4115441)
~~~~~~
That should be ItemPage(Q328) (enwiki), not ItemPage(Q4115441) (sewiki)
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
https://gerrit.wikimedia.org/r/80976
---
** [bugs:#1659] harvest_template adds incorrect source**
**Status:** open
**Created:** Mon Aug 26, 2013 10:37 AM UTC by Merlijn S. van Deen
**Last Updated:** Mon Aug 26, 2013 10:38 AM UTC
**Owner:** nobody
See http://ultimategerardm.blogspot.nl/2013/08/reviving-my-bot-for-wikidata-ii.…
to reproduce:
~~~~~
>>> from scripts import harvest_template
>>> R = harvest_template.HarvestRobot('x', 'y', 'z')
>>> R.setSource('en')
>>> R.source
Claim(Property:P143)
>>> R.source.target
ItemPage(Q4115441)
~~~~~~
That should be ItemPage(Q328) (enwiki), not ItemPage(Q4115441) (sewiki)
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
- Description has changed:
Diff:
~~~~
--- old
+++ new
@@ -2,6 +2,7 @@
to reproduce:
+~~~~~
>>> from scripts import harvest_template
>>> R = harvest_template.HarvestRobot('x', 'y', 'z')
>>> R.setSource('en')
@@ -9,6 +10,6 @@
Claim(Property:P143)
>>> R.source.target
ItemPage(Q4115441)
-
+~~~~~~
That should be ItemPage(Q328) (enwiki), not ItemPage(Q4115441) (sewiki)
~~~~
---
** [bugs:#1659] harvest_template adds incorrect source**
**Status:** open
**Created:** Mon Aug 26, 2013 10:37 AM UTC by Merlijn S. van Deen
**Last Updated:** Mon Aug 26, 2013 10:37 AM UTC
**Owner:** nobody
See http://ultimategerardm.blogspot.nl/2013/08/reviving-my-bot-for-wikidata-ii.…
to reproduce:
~~~~~
>>> from scripts import harvest_template
>>> R = harvest_template.HarvestRobot('x', 'y', 'z')
>>> R.setSource('en')
>>> R.source
Claim(Property:P143)
>>> R.source.target
ItemPage(Q4115441)
~~~~~~
That should be ItemPage(Q328) (enwiki), not ItemPage(Q4115441) (sewiki)
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
---
** [bugs:#1659] harvest_template adds incorrect source**
**Status:** open
**Created:** Mon Aug 26, 2013 10:37 AM UTC by Merlijn S. van Deen
**Last Updated:** Mon Aug 26, 2013 10:37 AM UTC
**Owner:** nobody
See http://ultimategerardm.blogspot.nl/2013/08/reviving-my-bot-for-wikidata-ii.…
to reproduce:
>>> from scripts import harvest_template
>>> R = harvest_template.HarvestRobot('x', 'y', 'z')
>>> R.setSource('en')
>>> R.source
Claim(Property:P143)
>>> R.source.target
ItemPage(Q4115441)
That should be ItemPage(Q328) (enwiki), not ItemPage(Q4115441) (sewiki)
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
I have submitted a patch that clarifies what uFFFD is: https://gerrit.wikimedia.org/r/80865
Apart from that, I think this is a site-specific issue, and thus consider this issue solved :-)
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** closed-wont-fix
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 03:30 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
- **status**: open-accepted --> closed-wont-fix
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** closed-wont-fix
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 03:30 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
OK, now I'm really confused. Look at this:
http://techbase.kde.org/api.php?action=query&prop=info&pageids=9156|6713&in…
in 2010, Ytsma moved the page from the (correct, if read as UTF-8) "Localization/fy/Fryske kompj%C3%BBterwurden" to the (correct, if read as latin-1) "Fryske_kompj%FBterwurden". This is also where the content now is, but it's inaccessible from the web UI, and only accessible by ID from the API!
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open-accepted
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 03:25 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
The problem seems to be in the API result:
http://techbase.kde.org/api.php?action=query&list=allpages&apfrom=Localizat…
This shows two results: the correct Localization/fy/Fryske kompjûterwurden and the incorrect Localization/fy/Fryske kompj�terwurde, with *different* page ids. More specifically:
http://techbase.kde.org/api.php?action=query&prop=info&pageids=9156|6713&in…
One is Fryske kompjûterwurde encoded as latin-1. This cannot be decoded as utf-8, and thus results in a � character. You can see this from the URL:
http://techbase.kde.org/Localization/fy/Fryske_kompj%FBterwurden <-- %FB = û in latin-1,
while
http://techbase.kde.org/Localization/fy/Fryske_kompj%C3%BBterwurden <-- %C3%BB = û in utf-8.
I think this qualifies as API bug, as the rest of mediawiki seems to be able to cope with the incorrect encoding. I'll try to get one of the API developers to take a look.
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open-accepted
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 03:24 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Yes, I forgot to mention that I’m working with http://sourceforge.net/p/pywikipediabot/patches/622/
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open-accepted
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 03:08 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
- **status**: open --> open-accepted
- **Group**: confirmed --> rewrite
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open-accepted
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 03:07 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.