Arjunaraoc created this task. Arjunaraoc added a subscriber: Arjunaraoc. Arjunaraoc added a project: pywikibot-core. Restricted Application added subscribers: Aklapper, pywikipedia-bugs.
TASK DESCRIPTION **Log from a debug run on Ubuntu 12.04** The version worked for some files and crash with a similar error before I decided to update to the latest core version. Now it does not work at all.
$python pwb.py replace.py -regex '(?s)^(.*)$' "{{యాంత్రిక అనువాదం}}\1" -file:"/home/arjun/RCourse/tewsn/gtp2mk2.txt" -simulate -debug -v -log The base directory is /home/arjun/corenew/core === Pywikibot framework v2.0 -- Logging header === COMMAND: ['replace.py', '-regex', '(?s)^(.*)$', '{{\xe0\xb0\xaf\xe0\xb0\xbe\xe0\xb0\x82\xe0\xb0\xa4\xe0\xb1\x8d\xe0\xb0\xb0\xe0\xb0\xbf\xe0\xb0\x95 \xe0\xb0\x85\xe0\xb0\xa8\xe0\xb1\x81\xe0\xb0\xb5\xe0\xb0\xbe\xe0\xb0\xa6\xe0\xb0\x82}}\1', '-file:/home/arjun/RCourse/tewsn/gtp2mk2.txt', '-simulate', '-debug', '-v', '-log'] DATE: 2015-05-14 03:56:51.045862 UTC VERSION: [https] r-pywikibot-core.git (41901c7, g5504, 2015/05/11, 20:05:08, n/a) SYSTEM: ('Linux', 'arjun-945GCM-S2L', '3.2.48-ctl471', '#1 SMP Thu Aug 15 09:42:30 IST 2013', 'i686') CONFIG FILE DIR: /home/arjun/corenew/core PACKAGES: Tkinter (/usr/lib/python2.7/lib-tk/Tkinter.pyc) = $Revision: 81008 $ distutils (/usr/lib/python2.7/distutils/) = 2.7.3 email (/usr/lib/python2.7/email/) = 4.0.3 json (/usr/lib/python2.7/json/) = 2.0.9 logging (/usr/lib/python2.7/logging/) = 0.5.1.2 mpl_toolkits (/usr/lib/pymodules/python2.7/mpl_toolkits/) = ?? mwparserfromhell: No module named mwparserfromhell pickle (/usr/lib/python2.7/pickle.pyc) = $Revision: 72223 $ pywikibot ([path unknown]) = ?? re (/usr/lib/python2.7/re.pyc) = 2.2.1 setuptools (/usr/lib/python2.7/dist-packages/setuptools/) = 0.6 urllib (/usr/lib/python2.7/urllib.pyc) = 1.17 urllib2 (/usr/lib/python2.7/urllib2.pyc) = 2.7 MODULES: /home/arjun/corenew/core/pywikibot/textlib.py 462afa2 2015-05-10 17:15:28.994976 /home/arjun/corenew/core/pywikibot/data/api.py c6cbf67 2015-05-10 17:15:28.922976 /home/arjun/corenew/core/pywikibot/userinterfaces/__init__.py 43eceeb 2015-05-10 17:15:28.998976 /home/arjun/corenew/core/pywikibot/i18n.py 77e57c6 2015-05-10 17:15:28.946976 /home/arjun/corenew/core/pywikibot/comms/threadedhttp.py 69cf1f8 2015-05-13 15:21:59.643962 /home/arjun/corenew/core/pywikibot/date.py 262e786 2015-05-10 17:15:28.930976 /home/arjun/corenew/core/pywikibot/data/__init__.py 44183c7 2015-05-10 17:15:28.914976 /home/arjun/corenew/core/pywikibot/fixes.py d84788c 2015-05-10 17:15:28.946976 /home/arjun/corenew/core/pywikibot/exceptions.py 4a30d02 2015-05-13 15:21:59.655962 /home/arjun/corenew/core/pywikibot/site.py eae3500 2015-05-13 15:21:59.699962 /home/arjun/corenew/core/pywikibot/bot.py de68ebd 2015-05-10 17:15:28.902976 /home/arjun/corenew/core/pywikibot/__init__.py d6ea7ec 2015-05-13 15:21:59.639962 /home/arjun/corenew/core/pywikibot/throttle.py 4157254 2015-05-10 17:15:28.994976 /home/arjun/corenew/core/pywikibot/page.py db5a1ef 2015-05-13 15:21:59.675962 /home/arjun/corenew/core/pywikibot/editor.py 7d0aa1b 2015-05-10 17:15:28.934976 /home/arjun/corenew/core/pywikibot/family.py b978cc6 2015-05-13 15:21:59.663963 /home/arjun/corenew/core/pywikibot/plural.py c9edb6b 2015-05-10 17:15:28.970976 /home/arjun/corenew/core/pywikibot/version.py 8de383e 2015-05-10 17:15:29.010976 /home/arjun/corenew/core/pywikibot/userinterfaces/terminal_interface.py 9a5fbf1 2015-05-10 17:15:28.998976 /home/arjun/corenew/core/pywikibot/config2.py 971b19d 2015-05-10 17:15:28.910976 /home/arjun/corenew/core/pywikibot/tools/ip.py 808c0cc 2015-05-10 17:15:28.998976 /home/arjun/corenew/core/pywikibot/comms/http.py b336a0a 2015-05-13 15:21:59.643962 /home/arjun/corenew/core/pywikibot/userinterfaces/terminal_interface_base.py 968a14b 2015-05-10 17:15:29.002976 /home/arjun/corenew/core/pywikibot/pagegenerators.py 12e7523 2015-05-13 15:21:59.679962 /home/arjun/corenew/core/pywikibot/userinterfaces/terminal_interface_unix.py 60d8cb2 2015-05-10 17:15:29.002976 /home/arjun/corenew/core/pywikibot/tools/__init__.py 692fc89 2015-05-13 15:21:59.699962 /home/arjun/corenew/core/pywikibot/diff.py 015dcbd 2015-05-10 17:15:28.934976 /home/arjun/corenew/core/pywikibot/login.py 70f3f31 2015-05-13 15:21:59.663963 /home/arjun/corenew/core/pywikibot/comms/__init__.py 747d0a7 2015-05-10 17:15:28.902976 /home/arjun/corenew/core/pywikibot/userinterfaces/transliteration.py efd4103 2015-05-10 17:15:29.010976 === === === === === === === === === === === === === === Pywikibot rd6ea7ece4f4d7867f211e16c13e62c9366627207 Python 2.7.3 (default, Dec 18 2014, 19:03:52) [GCC 4.6.3] The summary message for the command line replacements will be something like: Bot: Automated text replacement (-(?s)^(.*)$ +{{యాంత్రిక అనువాదం}}\1) Press Enter to use this automatic message, or enter a description of the changes your bot will make: +{{యాంత్రిక అనువాదం}} LOADING SITE wikipedia:te VERSION: 1.26wmf4 Found 1 wikipedia:te processes running, including this one. Retrieving 50 pages from wikipedia:te. ^CTraceback (most recent call last): File "pwb.py", line 239, in <module> if not main(): File "pwb.py", line 233, in main run_python_file(filename, argv, argvu, file_package) File "pwb.py", line 88, in run_python_file main_mod.__dict__) File "./scripts/replace.py", line 947, in <module> main() File "./scripts/replace.py", line 938, in main bot.run() File "./scripts/replace.py", line 589, in run new_text = self.apply_replacements(last_text, applied) File "./scripts/replace.py", line 516, in apply_replacements allowoverlap=self.allowoverlap, site=self.site) File "/home/arjun/corenew/core/pywikibot/textlib.py", line 308, in replaceExcept (match.group(groupID) or '') + KeyboardInterrupt Dropped throttle(s). <type 'exceptions.KeyboardInterrupt'> CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort All threads finished.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Arjunaraoc Cc: pywikipedia-bugs, Aklapper, Arjunaraoc, jayvdb
Arjunaraoc added a comment.
This problem was investigated to be happening only when using -file:"filename" parameter for the command replace.py
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Arjunaraoc Cc: pywikipedia-bugs, Aklapper, Arjunaraoc, jayvdb
Arjunaraoc added a comment.
The problem was traced to some specific pages, which are translated using google translate.
A test page is created here https://te.wikipedia.org/wiki/user:Arjunaraoc/sandbox_https://phabricator.wi... Arjunaraoc/sandbox https://phabricator.wikimedia.org/T99032
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Arjunaraoc Cc: pywikipedia-bugs, Aklapper, Arjunaraoc, jayvdb
XZise added a subscriber: XZise. XZise added a comment.
Regarding the order in core, https://gerrit.wikimedia.org/r/#/c/199631/ might fix this. Replace does preload the pages and core does return them in the order they were returned by the API.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
Arjunaraoc added a comment.
Yes, the order in core is working now.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Arjunaraoc Cc: XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
Arjunaraoc added a comment.
Looks like atleast in one instance, if the text contains malformed template code like missing a brace ( {{...} ), the replace.py crashes
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Arjunaraoc Cc: XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
Mpaa added a subscriber: Mpaa. Mpaa added a comment.
Can you post the page and the command to reproduce it?
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Mpaa Cc: Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
XZise added a comment.
@Mpaa: This was mostly given from the opening post and from the comment where they mention the sandbox page but here for everyone to copy:
`python pwb.py replace.py -regex '(?s)^(.*)$' "{{యాంత్రిక అనువాదం}}\1" -simulate -lang:te -family:wikipedia -page:'user:Arjunaraoc/sandbox_https://phabricator.wikimedia.org/T99032' -debug -v -log`
As far as I can see I also get it to hang using Python 3.4.2.
I also tried to reproduce your other issue that it crashes. I personally don't really understand why it would hang in one situation and crash in another. And with `python pwb.py replace.py -regex '(?s)^(.*)$' "{{యాంత్రిక అనువాదం}}\1" -simulate -lang:te -family:wikipedia -page:Main\ Page -page:'user:Arjunaraoc/sandbox_https://phabricator.wikimedia.org/T99032' -debug -v -log` I first get asked about the Main Page (and that works) and then it hangs again. So either is the crash related to something else or it's not simply triggered by a replacement before (it should be noted that I didn't saved the changes (apart from the fact that I was using simulation mode)).
It seems to be related to `replaceExcept` or at least in my tests and in the original post the traceback ends (or starts) in `replaceExcept`. And it might be related to the fact that you try to match everything and maybe it uses certain exceptions which have a problem with that. Anyway with that command it's reproducible so it's easier to look into.
And regarding my patch, you need to download it in order to have the “features” from it as it hasn't been merged. And it should only fix the order in which the bot works on the pages. If it also fixed your issues in this bug report you should tell us how you downloaded that patch. Because with that patch my computer hangs too.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
XZise changed the title from "pywikibot replace.py hangs for any operation from linux/windows." to "pywikibot replace.py hangs on certain conditions". XZise edited the task description. XZise set Security to None.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
XZise added a comment.
Ah! In the first sandbox page is `\0` and `replaceExcept` searches for backslash+number to replace that with the group (like you did with `{{...}}\1` but this time it's part of the text. As far as I can see the first time it works fine and it generates `{{...}}REST OF PAGE` and then (for some reason) it searches in the new result for eventual references again and then finds `\0`. And that code is an infinite loop unless it doesn't find any references again: https://git.wikimedia.org/blob/pywikibot%2Fcore.git/7f50b4ed6bdf27e7e29e6de0...
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
XZise claimed this task. XZise added a comment.
I might have a solution so before someone else wastes their time I just claim it for now.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise Cc: Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
gerritbot added a subscriber: gerritbot. gerritbot added a comment.
Change 212978 had a related patch set uploaded (by XZise): [FIX] replaceExcept: Replace references iteratively
https://gerrit.wikimedia.org/r/212978
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise, gerritbot Cc: gerritbot, Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
gerritbot added a project: Patch-For-Review.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise, gerritbot Cc: gerritbot, Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
gerritbot added a comment.
Change 212978 merged by jenkins-bot: [FIX] replaceExcept: Replace references iteratively
https://gerrit.wikimedia.org/r/212978
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise, gerritbot Cc: gerritbot, Mpaa, XZise, Aklapper, Arjunaraoc, jayvdb, pywikipedia-bugs
jayvdb added a subscriber: jayvdb. jayvdb added a comment.
Is this fixed now, or are there other known corner cases?
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise, jayvdb Cc: jayvdb, gerritbot, Mpaa, XZise, Aklapper, Arjunaraoc, pywikipedia-bugs
Ricordisamoa edited subscribers, added: Ricordisamoa; removed: gerritbot. Ricordisamoa removed a project: Patch-For-Review.
TASK DETAIL https://phabricator.wikimedia.org/T99032
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise, Ricordisamoa Cc: Ricordisamoa, jayvdb, Mpaa, XZise, Aklapper, Arjunaraoc, pywikipedia-bugs
pywikipedia-bugs@lists.wikimedia.org