[Pywikipedia-l] [ pywikipediabot-Bugs-1878986 ] getUrl() has a problem. No timeout?

SourceForge.net noreply at sourceforge.net
Thu Jan 24 17:21:45 UTC 2008


Bugs item #1878986, was opened at 2008-01-24 16:59
Message generated for change (Comment added) made by cosoleto
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1878986&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Filnik (filnik)
Assigned to: Nobody/Anonymous (nobody)
Summary: getUrl() has a problem. No timeout?

Initial Comment:
Hello, I've seen that in my processes there are some scripts that are started something like 1-2 weeks ago that are still running. 

The problem is that the function getUrl() of wikipedia.py doesn't raise any error after x time (or, I suppose that's this the reason, otherwise we have a bot that is trying to get a page for 1 week without a specific reason...).

I've not fixed the Bug only because I've no idea how to fix it (I have never handle with HTTP connections directly on python) but Bryan has said:

<Bryan> yes, but that would require you to modify the socket settings
<Bryan> sock.settimeout(1500)
<Bryan> or you do select.select on the socket
<Bryan> which is very hard in pywiki

Some ideas? :-) The 1500 by the way is only a number, we should/can set it on config.py. I've set this bug with high priority because infinite loops on toolserver are really a big problem.

Thanks, Filnik

----------------------------------------------------------------------

>Comment By: Francesco Cosoleto (cosoleto)
Date: 2008-01-24 18:21

Message:
Logged In: YES 
user_id=181280
Originator: NO

I am not sure PyWikipediaBot cause intensive cpu usage in Toolserver due
to this problem, anyway to fix temporary the no timeout problem seems there
is this easy solution:

import socket
socket.setdefaulttimeout(0.1)
urllib2.urlopen("http://cosoleto.free.fr").read()
[...]
urllib2.URLError: <urlopen error timed out>
urllib.urlopen("http://cosoleto.free.fr").read()
[...]
IOError: [Errno socket error] timed out

But I suggest libcurl (http://curl.haxx.se/libcurl/) to improve easily and
simplify the net side of the PyWikipedia code. libcurl is a feature rich
(persistant connections, trasparent compression support, etc...) and
portable URL transfer library written in C. Why not?

----------------------------------------------------------------------

Comment By: Bryan (btongminh)
Date: 2008-01-24 17:06

Message:
Logged In: YES 
user_id=1806226
Originator: NO

Note that it is much easier to do settimeout if persistent_http was
working. Unfortunately, it is not. I disabled it some time ago
(http://fisheye.ts.wikimedia.org/browse/pywikipedia/trunk/pywikipedia/wikipedia.py?r1=4638&r2=4641)
saying it needs investigation. Anybody here who is having to do this
investigation? It would not only solve Filnik's bug
(site.conn.sock.settimeout), but it would also greatly improve performance
for single threaded bots.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1878986&group_id=93107



More information about the Pywikipedia-l mailing list