I have written the following script:
# -*- coding: utf-8 -*-
import pywikibot
site = pywikibot.Site("gl", "wiktionary") page = pywikibot.Page(site, u"𐌰𐌽𐌳𐌰𐌿𐍂𐌰") print page.get()
It fails with the following output:
[gallaecio@afonso fontes]$ python2 test.py Traceback (most recent call last): File "test.py", line 7, in <module> print page.get() File "/usr/lib/python2.7/site-packages/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/usr/lib/python2.7/site-packages/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 278, in get self._getInternals(sysop) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 302, in _getInternals self.site.loadrevisions(self, getText=True, sysop=sysop) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 96, in site return self._link.site File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 3080, in site self.parse() File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 3037, in parse u"contains illegal char(s) '%s'" % m.group(0)) pywikibot.exceptions.InvalidTitle: contains illegal char(s) '𐌰'
Just a try: what happens for pywikibot.output(page.get()) instead of print?
2013/7/28 Adrián Chaves Fernández adriyetichaves@gmail.com
I have written the following script:
# -*- coding: utf-8 -*-
import pywikibot
site = pywikibot.Site("gl", "wiktionary") page = pywikibot.Page(site, u"𐌰𐌽𐌳𐌰𐌿𐍂𐌰") print page.get()
It fails with the following output:
[gallaecio@afonso fontes]$ python2 test.py Traceback (most recent call last): File "test.py", line 7, in <module> print page.get() File "/usr/lib/python2.7/site-packages/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/usr/lib/python2.7/site-packages/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 278, in get self._getInternals(sysop) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 302, in _getInternals self.site.loadrevisions(self, getText=True, sysop=sysop) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 96, in site return self._link.site File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 3080, in site self.parse() File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 3037, in parse u"contains illegal char(s) '%s'" % m.group(0)) pywikibot.exceptions.InvalidTitle: contains illegal char(s) '𐌰'
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Thanks for the suggestion. I get the same result, though.
I should probably mention that it works with asian characters, and that I’m running this on an utf8-enabled GNU/Linux terminal.
O Domingo, 28 de Xullo de 2013 09:11:14 Bináris escribiu:
Just a try: what happens for pywikibot.output(page.get()) instead of print?
2013/7/28 Adrián Chaves Fernández adriyetichaves@gmail.com
I have written the following script:
# -*- coding: utf-8 -*-
import pywikibot
site = pywikibot.Site("gl", "wiktionary") page = pywikibot.Page(site, u"𐌰𐌽𐌳𐌰𐌿𐍂𐌰") print page.get()
It fails with the following output:
[gallaecio@afonso fontes]$ python2 test.py Traceback (most recent call last): File "test.py", line 7, in <module> print page.get() File "/usr/lib/python2.7/site-packages/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/usr/lib/python2.7/site-packages/pywikibot/__init__.py", line 249, in wrapper return method(*__args, **__kw) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 278, in get self._getInternals(sysop) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 302, in _getInternals self.site.loadrevisions(self, getText=True, sysop=sysop) File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 96, in site return self._link.site File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 3080, in site self.parse() File "/usr/lib/python2.7/site-packages/pywikibot/page.py", line 3037, in parse u"contains illegal char(s) '%s'" % m.group(0)) pywikibot.exceptions.InvalidTitle: contains illegal char(s) '𐌰'
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Hi Adrián,
On 28 July 2013 09:01, Adrián Chaves Fernández adriyetichaves@gmail.comwrote:
page = pywikibot.Page(site, u"𐌰𐌽𐌳𐌰𐌿𐍂𐌰")
The problem is with our detection of illegal titles - it only OK's characters up to \uFFFF (i.e. the BMP[1]), which does not include Gothic, which is in the SMP[2]. I've changed our whitelist (which whitelisted up to \uFFFF) to a blacklist in https://gerrit.wikimedia.org/r/78525 , which should fix this issue.
[1] https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane [2] https://en.wikipedia.org/wiki/Plane_(Unicode)#Supplementary_Multilingual_Pla...
Best, Merlijn
pywikipedia-l@lists.wikimedia.org