https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Web browser: --- Bug ID: 55219 Summary: Updating complex pages Product: Pywikibot Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1399/ Reported by: malafaya Created on: 2012-01-17 00:22:50 Subject: Updating complex pages Original description: When updating complex pages, it's common to get a Timeout, because the Wikimedia server does not process and return the page within the expected time. In suchs cases (when a timeout exception is thrown), my suggestion si that pywikipedia should try to fetch the page again and check if there are any differences against the new page to be saved. If not, then it should proceed and not block indefinitely in such pages.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- This is the way the bot works. It trys to put the page for several times which is given by maxretries in the (user_)config.py. Edit conflicts are detected (by the mw api) except you are using your bot account for multiple edits on the same page in the same time.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **status**: open --> pending
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #3 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Hmmm, I'm not sure you understood. I'm not updating the page more than once simultaneoulsy. It's just one bot run. As the page is a complicated one, the server does not respond on time (you can try [[Europa]] at pt.wiktionary). The bot then tries again, but obviously the same happens. The difference is that the page has already been updated in the first try, even if the server has not responded. In operations such as replace.py, where it's common to edit long pages, you get in a long loop.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **status**: pending --> open
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #5 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- I'm talking about this error:
Updating page [[Sri Lanka]] via API HTTPError: 504 Gateway Time-out
The page to be updated is quite big so the server does not reply on time. 1) Is there a way to increase the timeout? I believe this is controlled by the server, not the HTTP client... 2) The page was updated on the first try but as the page is not refreshed between retries, the bot doesn't know and will try to update it "forever"
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://sourceforge.net/p/p | |ywikipediabot/bugs/1399
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
xqt info@gno.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |daniel@schwen.de
--- Comment #6 from xqt info@gno.de --- *** Bug 56884 has been marked as a duplicate of this bug. ***
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Fæ faebug@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |faebug@gmail.com
--- Comment #7 from Fæ faebug@gmail.com --- Checking this morning with Faebot, 1.6% of get/put transactions have failed out of a sample of more than 1,000. These were small size category changes rather than file uploads or large page edits. I believe most failures have been on putting pages rather than getting them, but I have seen getting pages causing this failure.
As everyone appears affected, not just API users, I have asked for feedback at the Village pump (http://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&diff...).
I am not convinced that this is a pywikipediabot specific problem, it does not relate to any changes in pywikipediabot which has never before had this problem with this frequency, so the bug report (1399) above may well be a dead end.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #8 from zhuyifei1999 zhuyifei1999@gmail.com --- 503 is also happening:
Sleeping for 7.9 seconds, 2013-11-13 11:20:55
Updating page [[File:Русский энциклопедический словарь Березина 4.2 077.jpg]] via API
Result: 503 Service Unavailable
Traceback (most recent call last): (hidden) File "(hidden)/pywikipedia/wikipedia.py", line 2242, in put sysop=sysop, botflag=botflag, maxTries=maxTries) File "(hidden)/pywikipedia/wikipedia.py", line 2339, in _putPage back_response=True) File "(hidden)/pywikipedia/pywikibot/support.py", line 121, in wrapper return method(*__args, **__kw) File "(hidden)/pywikipedia/query.py", line 138, in GetData site.cookies(sysop=sysop)) File "(hidden)/pywikipedia/wikipedia.py", line 6977, in postForm cookies=cookies) File "(hidden)/pywikipedia/wikipedia.py", line 7021, in postData f = MyURLopener.open(request) File "/usr/lib/python2.7/urllib2.py", line 406, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 519, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 444, in error return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 503: Service Unavailable
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
iDangerMouse humayunmirza88@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |humayunmirza88@gmail.com
--- Comment #9 from iDangerMouse humayunmirza88@gmail.com --- The problem is still ongoing
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Andre Klapper aklapper@wikimedia.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Updating complex pages |Timeout when updating | |complex pages
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Morten Wang warnckew@online.no changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |warnckew@online.no
--- Comment #10 from Morten Wang warnckew@online.no --- I'd like to second this. When saving large complex pages, I frequently get 503 responses. As Daniel Schwen notes in bug 56884, it would be great to be able to tell Pywikibot to _not_ retry and instead manually check if the edit went through.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #11 from Morten Wang warnckew@online.no --- I patched my local copy of Pywikibot core, adding a max_retries parameter to editpage() to only allow it to attempt an edit once. No changes to other files appear necessary since Page.save() passes on any additional parameters. Should I propose that as a patch? If so, what format is preferred?
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #12 from Merlijn van Deen valhallasw@arctus.nl --- If you could upload it to gerrit (either via git directly, or via the patch uploader at https://tools.wmflabs.org/gerrit-patch-uploader/ ), that would be really nice.
I'm a bit confused however, as data.api.Request seems to get max_retries from the config file. Does it get passed another value of max_retries somewhere? I can't find where that would be...
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #13 from Morten Wang warnckew@online.no --- data.api.Request does kwargs.pop(), so if it gets instantiated with a max_retries parameter it will use that value, otherwise it reads the config parameter.
In my case I found that I can just set pywikibot.config.max_retries instead of passing it as a parameter to Page.save(). Arguably nicer than passing a parameter around, which requires some way of handling a default value. Sorry about not figuring that out earlier.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Merlijn van Deen valhallasw@arctus.nl changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |valhallasw@arctus.nl
--- Comment #14 from Merlijn van Deen valhallasw@arctus.nl --- I'm still a bit confused by Daniel's comment:
Now pywikipediabot tries again by itself an apparently infinite amount of times Despite having set max_retries to 2 in my user-config.py
but this does seem to work for me (at least: setting max_retries in user-config.py sets pywikibot.config.max_retries). Strange.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #15 from Daniel Schwen daniel@schwen.de --- Ahhrgh! I changed the max_retries setting in ./user-config.py but core reads ~/.pywikibot/user-config.py
Sorry. Will try again with the new setting.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Bawolff (Brian Wolff) bawolff+wn@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |bawolff+wn@gmail.com
--- Comment #16 from Bawolff (Brian Wolff) bawolff+wn@gmail.com --- On the wikimedia side see also bug 57026. (Not a dupe since Pywikipedia should also handle these situations gracefully.)
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #17 from Ricordisamoa ricordisamoa@live.it --- *** Bug 55162 has been marked as a duplicate of this bug. ***
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
--- Comment #18 from Amir Ladsgroup ladsgroup@gmail.com --- *** Bug 55179 has been marked as a duplicate of this bug. ***
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Amir Ladsgroup ladsgroup@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|Unprioritized |High CC| |ladsgroup@gmail.com
https://bugzilla.wikimedia.org/show_bug.cgi?id=55219
Ricordisamoa ricordisamoa@openmailbox.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ricordisamoa@openmailbox.or | |g Component|General |network
pywikipedia-bugs@lists.wikimedia.org