JAnD created this task. JAnD added a subscriber: JAnD. JAnD added a project: pywikibot-core.
TASK DESCRIPTION zh-min-nan wiktionary returns different names of pages:
I:\py\rewrite>pwb.py interwiki -family:wiktionary -subcats:Gí-giân -cleanup -lang:zh-min-nan -async -whenneeded:5 -untranslated
``` NOTE: Number of pages queued is 0, trying to add 50 more. Retrieving 36 pages from wiktionary:zh-min-nan. WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Bân-lâm-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hôa-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Eng-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Ji?t-gí' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hui-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hoat-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Tek-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Se-pan-gâ-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Phux-tô-gâ-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:O?at-lâm-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:A-la-pek-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Ke-te-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Dan-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hun-lân-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:In-nî-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Í-tai-li-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Lo -se-a-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hi-lia?p-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:La-teng-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Pe?h lo -se-a- gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hi-pek-lâi-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Se-kai-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Peng-te-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Pho-lân-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Lâm-hui-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Bông-kó -gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Pho-su-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Thai-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Má-lâi-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Ido-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hân-gú' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Thó -ní-kî-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Mî-iux?-tó-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Bân-tang-oe' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Kheh-oe' WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Tagalog-gú' Dump nan (wiktionary) appended. Traceback (most recent call last): File "I:\py\rewrite\pwb.py", line 222, in <module> run_python_file(filename, argv, argvu, file_package) File "I:\py\rewrite\pwb.py", line 81, in run_python_file main_mod.__dict__) File ".\scripts\interwiki.py", line 2647, in <module> main() File ".\scripts\interwiki.py", line 2622, in main bot.run() File ".\scripts\interwiki.py", line 2365, in run self.queryStep() File ".\scripts\interwiki.py", line 2338, in queryStep self.oneQuery() File ".\scripts\interwiki.py", line 2334, in oneQuery subject.batchLoaded(self) File ".\scripts\interwiki.py", line 1321, in batchLoaded elif page.isRedirectPage() or page.isCategoryRedirect(): File "I:\py\rewrite\pywikibot\page.py", line 644, in isCategoryRedirect for (template, args) in self.templatesWithParams(): File "I:\py\rewrite\pywikibot\tools.py", line 711, in wrapper return obj(*__args, **__kw) File "I:\py\rewrite\pywikibot\page.py", line 1869, in templatesWithParams templates = textlib.extract_templates_and_params(self.text) File "I:\py\rewrite\pywikibot\page.py", line 440, in text self._text = self.get(get_redirect=True) File "I:\py\rewrite\pywikibot\tools.py", line 711, in wrapper return obj(*__args, **__kw) File "I:\py\rewrite\pywikibot\page.py", line 349, in get self._getInternals(sysop) File "I:\py\rewrite\pywikibot\page.py", line 373, in _getInternals self.site.loadrevisions(self, getText=True, sysop=sysop) File "I:\py\rewrite\pywikibot\site.py", line 3167, in loadrevisions % (page, pagedata['title'])) pywikibot.exceptions.Error: loadrevisions: Query on [[zh-min-nan:ňłćÚí×:L┼źi-pia ╠Źt:A-la-pek-g├║]] returned data on 'L┼źi-pia╠Źt:L┼źi-pia╠Źt:A-la-pek-g├║' <class 'pywikibot.exceptions.Error'> CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort ```
I:\py\rewrite>
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAnD Cc: Aklapper, JAnD, jayvdb, pywikipedia-bugs
JAnD added a comment.
It seems that api returns different name of category namespace than is displayed
For compat it could be solved by adding ` self.namespaces[14]['zh-min-nan'] = [u'Lūi-pia̍t', u'分類']` to families/wiktionary_family.py
but for core I don't know, where are namespace names defined
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAnD Cc: Aklapper, JAnD, jayvdb, pywikipedia-bugs
XZise added a subscriber: XZise. XZise added a comment.
The namespace is dynamically queried, so it does appear in: http://zh-min-nan.wiktionary.org/w/api.php?action=query&meta=siteinfo&am...
If you search for `: 14` you get two results and one contains a result with two characters and other looks like “Lūi-pia̍t”.
Now the stacktrace is hard to decypher but I guess it queries `分類:Lūi-pia̍t:…` but it gets a result for `Lūi-pia̍t:Lūi-pia̍t:…`. What is strange that the namespace name appears twice but it should actually say both are the same, as `APISite.sametitle` does determine the namespace ID and also supports the aliases.
import pywikibot s = pywikibot.Site('zh-min-nan', 'wiktionary') s.sametitle('分類:Lūi-pia̍t:…', 'Lūi-pia̍t:Lūi-pia̍t:…')
True
s.namespaces[14]
Namespace(id=14, custom_name='Lūi-pia̍t', canonical_name='Category', aliases=['分類'], case='case-sensitive')
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Aklapper, JAnD, XZise, jayvdb, pywikipedia-bugs
XZise added a comment.
By the way I was trying to get the revisions for that page manually but I can't determine the page name you are using because of all that gibberish. Maybe someone knows how to convert that into Unicode?
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Aklapper, JAnD, XZise, jayvdb, pywikipedia-bugs
XZise added a comment.
Okay after a bit of trickery I was able to determine that the UTF8 content was encoded as cp852 instead:
'ňłćÚí×:L┼źi-pia╠Źt:A-la-pek-g├║'.encode('cp852').decode('utf8')
'分類:Lūi-pia̍t:A-la-pek-gú'
And as I thought the API returned a result for the other namespace name: https://zh-min-nan.wiktionary.org/w/api.php?action=query&prop=revisions&... (apart from the fact that it says missing)
I don't get your error when I try to get a revisions (and the error I get is correct). When I remove one namespace but still use the namespace alias it does work.
import pywikibot p = 'ňłćÚí×:L┼źi-pia╠Źt:A-la-pek-g├║'.encode('cp852').decode('utf8') po = pywikibot.Page(pywikibot.Site('zh-min-nan', 'wiktionary'), p) po.exists()
False
list(po.revisions(total=1))
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/xzise/Programms/core/pywikibot/page.py", line 1374, in revisions step=step, total=total) File "/home/xzise/Programms/core/pywikibot/site.py", line 3169, in loadrevisions raise NoPage(page) pywikibot.exceptions.NoPage: Page [[wiktionary:zh-min-nan:Lūi-pia̍t:Lūi-pia̍t:A-la-pek-gú]] doesn't exist.
po = pywikibot.Page(pywikibot.Site('zh-min-nan', 'wiktionary'), '分類:A-la-pek-gú') po.exists()
True
list(po.revisions(total=1))
[<pywikibot.page.Revision object at 0x7f1d1ebc2fd0>]
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Aklapper, JAnD, XZise, jayvdb, pywikipedia-bugs
valhallasw added a subscriber: valhallasw. valhallasw added a comment.
@jand, could you post your user-config.py? Do you have any transliteration_target and console_encoding set?
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: valhallasw Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
XZise edited the task description. XZise set Security to none.
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
valhallasw added a comment.
Two more questions:
- do you get this immediately after starting the interwiki bot? Because I get
Retrieving 35 pages from wiktionary:zh-min-nan. [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]]: [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]] gives new interwiki [[af:Kategorie:Woorde in Arabies]] [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]]: [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]] gives new interwiki [[ar:تصنيف:عربية]] [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]]: [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]] gives new interwiki [[ast:Categoría:Árabe]]
etc. Can you re-run with -debug, and provide the debug log that is then stored in debug/
- Could you test whether
python pwb.py listpages -family:wiktionary -lang:zh-min-nan -subcats:Gí-giân -v -debug -get
gives the same error for you?
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: valhallasw Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
XZise added a comment.
Okay one question from me now: Those warnings that a query returned unexpected titles. Are those new, because those rely also on `APISite.sametitle` so it could be connected to your original problem.
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
JAnD added a comment.
In https://phabricator.wikimedia.org/T86696#974780, @XZise wrote:
Okay one question from me now: Those warnings that a query returned unexpected titles. Are those new, because those rely also on `APISite.sametitle` so it could be connected to your original problem.
This error appeared in the middle of work, so maybe some change in server side, because later there were new names of categories in zh-min-nan.wikt
Because of https://phabricator.wikimedia.org/T86621 I am still not able to try it correctly now in work PC. I'll try to completely reinstall pywikibot there :-(
The base directory is d:\Py\rewrite === Pywikibot framework v2.0 -- Logging header === COMMAND: ['listpages', '-family:wiktionary', '-lang:zh-min-nan', '-subcats:G\xed -gi\xe2n', '-v', '-debug', '-get'] DATE: 2015-01-14 06:54:05.896000 UTC VERSION: pywikibot-core (161110a, s5977, 2015/01/12, 21:07:52, n/a) CONFIG FILE DIR: d:\Py\rewrite PACKAGES: _ctypes (C:\Python27\DLLs_ctypes.pyd) = 1.1.0 _hashlib (C:\Python27\DLLs_hashlib.pyd) = ?? _socket (C:\Python27\DLLs_socket.pyd) = ?? _sqlite3 (C:\Python27\DLLs_sqlite3.pyd) = ?? _ssl (C:\Python27\DLLs_ssl.pyd) = ?? ctypes (C:\Python27\lib\ctypes) = 1.1.0 distutils (C:\Python27\lib\distutils) = 2.7.3 email (C:\Python27\lib\email) = 4.0.3 logging (C:\Python27\lib\logging) = 0.5.1.2 mwparserfromhell: No module named mwparserfromhell pickle (C:\Python27\lib\pickle.pyc) = $Revision: 72223 $ pyexpat (C:\Python27\DLLs\pyexpat.pyd) = 2.7.3 pywikibot ([path unknown]) = ?? re (C:\Python27\lib\re.pyc) = 2.2.1 unicodedata (C:\Python27\DLLs\unicodedata.pyd) = ?? urllib (C:\Python27\lib\urllib.pyc) = 1.17 urllib2 (C:\Python27\lib\urllib2.pyc) = 2.7 MODULES: pywikibot/comms/http.py 2015-01-13 09:16:52.361998 pywikibot/data/api.py 2015-01-13 09:16:52.344997 pywikibot/textlib.py 530dc70 2015-01-12 01:02:38 pywikibot/i18n.py 4a96bbe 2015-01-13 09:05:27.612832 pywikibot/comms/threadedhttp.py 8de4213 2015-01-12 01:02:38 pywikibot/date.py 36dc254 2015-01-12 01:02:38 pywikibot/exceptions.py 2a948c5 2015-01-12 01:02:38 pywikibot/site.py 2015-01-13 09:17:32.105271 pywikibot/bot.py 2015-01-13 09:11:58.632197 pywikibot/throttle.py a311a20 2015-01-12 01:02:38 pywikibot/page.py 2015-01-13 09:12:04.328523 pywikibot/family.py 2015-01-13 09:12:02.689430 pywikibot/plural.py 02a50e4 2015-01-12 01:02:38 pywikibot/version.py 2229075 2015-01-12 01:02:38 pywikibot/userinterfaces/terminal_interface.py b0e2743 2015-01-12 01:02:38 pywikibot/config2.py 2015-01-13 09:12:00.894327 pywikibot/userinterfaces/terminal_interface_win32.py 7e3fd89 2015-01-12 01:02: 38 pywikibot/userinterfaces/terminal_interface_base.py 84f7102 2015-01-13 09:13:3 1.471508 pywikibot/pagegenerators.py aeb2d15 2015-01-12 01:02:38 pywikibot/tools.py 2015-01-13 09:16:52.399000 pywikibot/diff.py 09fdfdf 2015-01-12 01:02:38 pywikibot/login.py db7be8f 2015-01-12 01:02:38 pywikibot/userinterfaces/transliteration.py 1d8e217 2015-01-12 01:02:38 === === === === === === === === === === === === === === Pywikibot r7fd7983bff6db53bed6a75f5137623890f7a5292 Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] Found 1 wiktionary:zh-min-nan processes running, including this one. ERROR: Traceback (most recent call last): File "D:\Py\rewrite\pywikibot\data\api.py", line 983, in submit headers=headers, body=body) File "D:\Py\rewrite\pywikibot\tools.py", line 711, in wrapper return obj(*__args, **__kw) File "D:\Py\rewrite\pywikibot\comms\http.py", line 248, in request baseuri = site.base_url(uri) File "D:\Py\rewrite\pywikibot\site.py", line 641, in __getattr__ % (self.__class__.__name__, attr)) AttributeError: APISite instance has no attribute 'base_url'
/w/api.php?maxlag=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query& siprop=namespaces%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg, maxlag =5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=namespaces %7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg WARNING: Waiting 5 seconds before retrying. ERROR: Traceback (most recent call last): File "D:\Py\rewrite\pywikibot\data\api.py", line 983, in submit headers=headers, body=body) File "D:\Py\rewrite\pywikibot\tools.py", line 711, in wrapper return obj(*__args, **__kw) File "D:\Py\rewrite\pywikibot\comms\http.py", line 248, in request baseuri = site.base_url(uri) File "D:\Py\rewrite\pywikibot\site.py", line 641, in __getattr__ % (self.__class__.__name__, attr)) AttributeError: APISite instance has no attribute 'base_url'
/w/api.php?maxlag=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query& siprop=namespaces%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg, maxlag =5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=namespaces %7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg WARNING: Waiting 10 seconds before retrying.
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAnD Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
JAnD closed this task as "Resolved". JAnD claimed this task. JAnD added a comment.
Probably only some serverside change, now it works again
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAnD Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
XZise lowered the priority of this task from "Unbreak Now!" to "Normal". XZise placed this task up for grabs.
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
XZise added a comment.
I'm curious what server side change this might be…
Also please don't mark your bug reports immediately with “Unbreak Now!” (or another higher priority). Obviously you want that bug to be fixed, but doesn't everybody want that? And as we saw in both this and https://phabricator.wikimedia.org/T86621, which were marked as “Unbreak Now!”, are pywikibot unrelated bugs (although I'm not sure what went wrong here).
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Aklapper, JAnD, XZise, valhallasw, jayvdb, pywikipedia-bugs
jayvdb changed the title from "zh-min-nan " to "zh-min-nan wiktionary preloadpages".
TASK DETAIL https://phabricator.wikimedia.org/T86696
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: JAnD, valhallasw, Aklapper, XZise, jayvdb, pywikipedia-bugs
pywikipedia-bugs@lists.wikimedia.org