Bugs item #2922193, was opened at 2009-12-28 11:56 Message generated for change (Settings changed) made by xqt You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2922193...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None
Status: Closed
Resolution: None Priority: 5 Private: No Submitted By: masti (masti01)
Assigned to: xqt (xqt)
Summary: interwiki fails on pl.wikibooks
Initial Comment: while running interwiki.py on pl.wiikibooks the script goes thru pages but from time to time it stalls with: "Received incomplete XML data. Sleeping for 15 seconds..." error and stays like that forever
example: $python pywikipedia/interwiki.py -start:"H. K. T." -auto NOTE: Number of pages queued is 0, trying to add 60 more. Getting 60 pages from wikisource:pl... [then some output about processing] [and then ...] ======Post-processing [[pl:Ha! jeszcze o mnie...]]====== Updating links on page [[pl:Ha! jeszcze o mnie...]]. No changes needed ======Post-processing [[pl:H. K. T.]]====== Updating links on page [[pl:H. K. T.]]. No changes needed NOTE: The first unfinished subject is [[pl:Had we never loved so kindly]] NOTE: Number of pages queued is 10, trying to add 60 more. Getting 60 pages from wikisource:pl... Received incomplete XML data. Sleeping for 15 seconds... Received incomplete XML data. Sleeping for 30 seconds... Received incomplete XML data. Sleeping for 45 seconds... Received incomplete XML data. Sleeping for 60 seconds... Received incomplete XML data. Sleeping for 75 seconds... Received incomplete XML data. Sleeping for 135 seconds... Received incomplete XML data. Sleeping for 195 seconds... Received incomplete XML data. Sleeping for 255 seconds... Received incomplete XML data. Sleeping for 315 seconds... and so on ...
python pywikipedia/version.py Pywikipedia (r7830 (wikipedia.py), 2009/12/27, 14:20:21) Python 2.6.2 (r262:71600, Aug 21 2009, 12:22:21) [GCC 4.4.1 20090818 (Red Hat 4.4.1-6)]
----------------------------------------------------------------------
Comment By: masti (masti01) Date: 2009-12-29 23:07
Message: I have split the page, which was anyhow requested, and with pages approx. 800kB size bot works properly. So for this particular case workaround works and I think we can close the issue. Although it looks like MediaWiki problem not the pywikibot problem. Thank for pointing me to real cause :)
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2009-12-29 22:36
Message: Maybe but I am not sure. I could try to get the requested Pages via API instead of XML-exporting, but there is some irregular with eo-x-encoding. This is the reason why this is not implemented yet except of some developer hacks. Perhaps it will run next year ;)
----------------------------------------------------------------------
Comment By: masti (masti01) Date: 2009-12-29 20:27
Message: after exporting i see that it end just after title and firs <id> tag:
<title>Historyja literatury angielskiej</title> <id>14057</id>
One other thing. This page is huge 1469kB. Maybe this a problem with exporting?
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2009-12-29 19:55
Message: You can try to export this page. [[s:pl:Specjalna:Eksport]] It also is not possible. Maybe the page is corrupt. We should ask at tech-channel.
----------------------------------------------------------------------
Comment By: masti (masti01) Date: 2009-12-29 19:49
Message: You're right the output for this page is: '\n <id>14057</id>\n'
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2009-12-29 19:35
Message: I guess this is a problem of the page 'Historyja literatury angielskiej'. You may try the statement below but with
page=wikipedia.Page(site, 'Historyja literatury angielskiej')
It won't work. But I've no idea for now.
----------------------------------------------------------------------
Comment By: masti (masti01) Date: 2009-12-29 15:51
Message: Yes, it works:
[mst@pl37007 pywikipedia]$ python Python 2.6.2 (r262:71600, Aug 21 2009, 12:22:21) [GCC 4.4.1 20090818 (Red Hat 4.4.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import wikipedia site = wikipedia.getSite('pl','wikisource') ga = wikipedia._GetAll(site, pages=[], throttle=0, force=True) page=wikipedia.Page(site, 'H. K. T.') ga.pages = [page] data=ga.getData() data[-20:]
'/page>\n</mediawiki>\n'
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2009-12-29 09:16
Message: Could you test the following statements in your python shell (idle):
import wikipedia site = wikipedia.getSite('pl', 'wikisource') ga = wikipedia._GetAll(site, pages=[], throttle=0, force=True) page=wikipedia.Page(site, 'H. K. T') ga.pages = [page] data=ga.getData() data[-20:]
As result you should see this:
'einfo>\n</mediawiki>\n'
----------------------------------------------------------------------
Comment By: masti (masti01) Date: 2009-12-28 18:50
Message: one thing was wrong: it's pl.wikisource not wikibooks. This error persist since some months already, and it happens everytime I run a bot. So this should not be due to server performance. I was runnig a bot on several other projects as well today and this happens only on pl.wikisource. This is just an example but the bot stalls on several different articles.
I am using API.
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2009-12-28 18:38
Message: I found the server was very slow today for pl sites. I had a similar delay on pl-wiki. Maybe it works late.
On the other hand you can try via the API. Does this work?
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2922193...
pywikipedia-bugs@lists.wikimedia.org