Bugs item #2771272, was opened at 2009-04-17 21:24
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=277127…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: network
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: 44 Error Dump Files
Initial Comment:
python interwiki.py -autonomous -new:1000
Generated 44 SaxParseBug_wikipedia_...dump files as in attached zip file.. Nightly version
14th April. Ran on 17th April.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date:
2009-04-30 16:08
Message:
Looks as fixed. Closing...
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-30 11:02
Message:
Fair enough :)
I went ahead and committed in r6767 a check for '</mediawiki>' that should
prevent some, if not all, of these errors.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-28 15:08
Message:
It's a so peculiar behaviour replacing the proper page contents with a HTML
error message. It shouldn't surprise you I haven't noticed that. So
probably fixing the problem reported by me not resolve this bug, as HTTP
server sends a 'Content-Length' header value that matches the length of
recevied data.
Anyway, if am not wrong again, data received should be terminated with
'</mediawiki>', so, probably, it's better check this than mutable and
placed somewhere English strings.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-28 05:16
Message:
Actually, to clarify, there are several reasons that might cause
SaxErrors:
* a xmlparser bug (unlikely)
* communication issues: retrieving invalid or incomplete data as cosoleto
mentions belows
* a server outage, like it happened very recently: these days, some
Wikimedia servers were being taken out of rotation for upgrade, resulting
in temporary Database Slave outage : this
http://pastebin.com/f220d5ece
message was printed from time to times. For edit actions, it doesnt matter:
_get detects an invalid content, and retries. SaxErrors only happen in
GetAll, when using Special:Export to retrieve content. In this case,
Special:Export return revisions one by one, and at a point during the query
result generation, encounters a DB error and cannot fetch a revision: the
data returned by postData is then the beginning of an xml file, containing
the namespace information, a few revisions... and at the end the HTML error
message. This is the issue that tieump tries to fix here.
----------------------------------------------------------------------
Comment By: Tieum P (tieump)
Date: 2009-04-28 04:55
Message:
This happens when some wikis send an error page. I posted a patch at
http://pastebin.com/m597b90e8 BUT there is a risk that if the string "No
working slave server" is a valid part of the article, we will be caught in
an infinite loop
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-19 09:32
Message:
postData() doesn't check the length of data sent from the server, unluke
getUrl() so the framework tries to parse truncated date then you get
errors.
----------------------------------------------------------------------
Comment By: Mikko Silvonen (silvonen)
Date: 2009-04-19 07:15
Message:
These dump files are generated more frequently when the Wikipedia servers
have database problems (as they have had for the last few days).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=277127…