Bugs item #2771272, was opened at 2009-04-17 21:24 Message generated for change (Comment added) made by cosoleto You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2771272...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: network Group: None
Status: Closed Resolution: Fixed
Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: 44 Error Dump Files
Initial Comment: python interwiki.py -autonomous -new:1000
Generated 44 SaxParseBug_wikipedia_...dump files as in attached zip file.. Nightly version 14th April. Ran on 17th April.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 16:08
Message: Looks as fixed. Closing...
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-04-30 11:02
Message: Fair enough :)
I went ahead and committed in r6767 a check for '</mediawiki>' that should prevent some, if not all, of these errors.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto) Date: 2009-04-28 15:08
Message: It's a so peculiar behaviour replacing the proper page contents with a HTML error message. It shouldn't surprise you I haven't noticed that. So probably fixing the problem reported by me not resolve this bug, as HTTP server sends a 'Content-Length' header value that matches the length of recevied data.
Anyway, if am not wrong again, data received should be terminated with '</mediawiki>', so, probably, it's better check this than mutable and placed somewhere English strings.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2009-04-28 05:16
Message: Actually, to clarify, there are several reasons that might cause SaxErrors: * a xmlparser bug (unlikely) * communication issues: retrieving invalid or incomplete data as cosoleto mentions belows * a server outage, like it happened very recently: these days, some Wikimedia servers were being taken out of rotation for upgrade, resulting in temporary Database Slave outage : this http://pastebin.com/f220d5ece message was printed from time to times. For edit actions, it doesnt matter: _get detects an invalid content, and retries. SaxErrors only happen in GetAll, when using Special:Export to retrieve content. In this case, Special:Export return revisions one by one, and at a point during the query result generation, encounters a DB error and cannot fetch a revision: the data returned by postData is then the beginning of an xml file, containing the namespace information, a few revisions... and at the end the HTML error message. This is the issue that tieump tries to fix here.
----------------------------------------------------------------------
Comment By: Tieum P (tieump) Date: 2009-04-28 04:55
Message: This happens when some wikis send an error page. I posted a patch at http://pastebin.com/m597b90e8 BUT there is a risk that if the string "No working slave server" is a valid part of the article, we will be caught in an infinite loop
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto) Date: 2009-04-19 09:32
Message: postData() doesn't check the length of data sent from the server, unluke getUrl() so the framework tries to parse truncated date then you get errors.
----------------------------------------------------------------------
Comment By: Mikko Silvonen (silvonen) Date: 2009-04-19 07:15
Message: These dump files are generated more frequently when the Wikipedia servers have database problems (as they have had for the last few days).
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2771272...
pywikipedia-bugs@lists.wikimedia.org