Bugs item #2922193, was opened at 2009-12-28 11:56
Message generated for change (Settings changed) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=292219…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: masti (masti01)
Assigned to: xqt (xqt)
Summary: interwiki fails
on pl.wikibooks
Initial Comment:
while running interwiki.py on pl.wiikibooks the script goes thru pages but from time to
time it stalls with: "Received incomplete XML data. Sleeping for 15 seconds..."
error and stays like that forever
example:
$python pywikipedia/interwiki.py -start:"H. K. T." -auto
NOTE: Number of pages queued is 0, trying to add 60 more.
Getting 60 pages from wikisource:pl...
[then some output about processing]
[and then ...]
======Post-processing [[pl:Ha! jeszcze o mnie...]]======
Updating links on page [[pl:Ha! jeszcze o mnie...]].
No changes needed
======Post-processing [[pl:H. K. T.]]======
Updating links on page [[pl:H. K. T.]].
No changes needed
NOTE: The first unfinished subject is [[pl:Had we never loved so kindly]]
NOTE: Number of pages queued is 10, trying to add 60 more.
Getting 60 pages from wikisource:pl...
Received incomplete XML data. Sleeping for 15 seconds...
Received incomplete XML data. Sleeping for 30 seconds...
Received incomplete XML data. Sleeping for 45 seconds...
Received incomplete XML data. Sleeping for 60 seconds...
Received incomplete XML data. Sleeping for 75 seconds...
Received incomplete XML data. Sleeping for 135 seconds...
Received incomplete XML data. Sleeping for 195 seconds...
Received incomplete XML data. Sleeping for 255 seconds...
Received incomplete XML data. Sleeping for 315 seconds...
and so on ...
python pywikipedia/version.py
Pywikipedia (r7830 (wikipedia.py), 2009/12/27, 14:20:21)
Python 2.6.2 (r262:71600, Aug 21 2009, 12:22:21)
[GCC 4.4.1 20090818 (Red Hat 4.4.1-6)]
----------------------------------------------------------------------
Comment By: masti (masti01)
Date: 2009-12-29 23:07
Message:
I have split the page, which was anyhow requested, and with pages approx.
800kB size bot works properly. So for this particular case workaround works
and I think we can close the issue. Although it looks like MediaWiki
problem not the pywikibot problem. Thank for pointing me to real cause :)
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-12-29 22:36
Message:
Maybe but I am not sure. I could try to get the requested Pages via API
instead of XML-exporting, but there is some irregular with eo-x-encoding.
This is the reason why this is not implemented yet except of some developer
hacks. Perhaps it will run next year ;)
----------------------------------------------------------------------
Comment By: masti (masti01)
Date: 2009-12-29 20:27
Message:
after exporting i see that it end just after title and firs <id> tag:
<title>Historyja literatury angielskiej</title>
<id>14057</id>
One other thing. This page is huge 1469kB. Maybe this a problem with
exporting?
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-12-29 19:55
Message:
You can try to export this page. [[s:pl:Specjalna:Eksport]] It also is not
possible. Maybe the page is corrupt. We should ask at tech-channel.
----------------------------------------------------------------------
Comment By: masti (masti01)
Date: 2009-12-29 19:49
Message:
You're right the output for this page is:
'\n <id>14057</id>\n'
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-12-29 19:35
Message:
I guess this is a problem of the page 'Historyja literatury angielskiej'.
You may try the statement below but with
page=wikipedia.Page(site, 'Historyja literatury angielskiej')
It won't work. But I've no idea for now.
----------------------------------------------------------------------
Comment By: masti (masti01)
Date: 2009-12-29 15:51
Message:
Yes, it works:
[mst@pl37007 pywikipedia]$ python
Python 2.6.2 (r262:71600, Aug 21 2009, 12:22:21)
[GCC 4.4.1 20090818 (Red Hat 4.4.1-6)] on linux2
Type "help", "copyright", "credits" or "license"
for more information.
>> import wikipedia
>> site = wikipedia.getSite('pl','wikisource')
>> ga = wikipedia._GetAll(site, pages=[], throttle=0, force=True)
>> page=wikipedia.Page(site, 'H. K. T.')
>> ga.pages = [page]
>> data=ga.getData()
>> data[-20:]
'/page>\n</mediawiki>\n'
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-12-29 09:16
Message:
Could you test the following statements in your python shell (idle):
>> import wikipedia
>> site = wikipedia.getSite('pl', 'wikisource')
>> ga = wikipedia._GetAll(site, pages=[], throttle=0, force=True)
>> page=wikipedia.Page(site, 'H. K. T')
>> ga.pages = [page]
>> data=ga.getData()
>> data[-20:]
As result you should see this:
'einfo>\n</mediawiki>\n'
>>
----------------------------------------------------------------------
Comment By: masti (masti01)
Date: 2009-12-28 18:50
Message:
one thing was wrong: it's pl.wikisource not wikibooks. This error persist
since some months already, and it happens everytime I run a bot. So this
should not be due to server performance. I was runnig a bot on several
other projects as well today and this happens only on pl.wikisource. This
is just an example but the bot stalls on several different articles.
I am using API.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-12-28 18:38
Message:
I found the server was very slow today for pl sites. I had a similar delay
on pl-wiki. Maybe it works late.
On the other hand you can try via the API. Does this work?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=292219…