[Pywikipedia-l] [ pywikipediabot-Bugs-1878986 ] getUrl() has a problem. No timeout?

SourceForge.net noreply at sourceforge.net
Sun Jan 27 18:16:39 UTC 2008


Bugs item #1878986, was opened at 2008-01-24 16:59
Message generated for change (Comment added) made by btongminh
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1878986&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Closed
Resolution: None
Priority: 7
Private: No
Submitted By: Filnik (filnik)
Assigned to: Nobody/Anonymous (nobody)
Summary: getUrl() has a problem. No timeout?

Initial Comment:
Hello, I've seen that in my processes there are some scripts that are started something like 1-2 weeks ago that are still running. 

The problem is that the function getUrl() of wikipedia.py doesn't raise any error after x time (or, I suppose that's this the reason, otherwise we have a bot that is trying to get a page for 1 week without a specific reason...).

I've not fixed the Bug only because I've no idea how to fix it (I have never handle with HTTP connections directly on python) but Bryan has said:

<Bryan> yes, but that would require you to modify the socket settings
<Bryan> sock.settimeout(1500)
<Bryan> or you do select.select on the socket
<Bryan> which is very hard in pywiki

Some ideas? :-) The 1500 by the way is only a number, we should/can set it on config.py. I've set this bug with high priority because infinite loops on toolserver are really a big problem.

Thanks, Filnik

----------------------------------------------------------------------

>Comment By: Bryan (btongminh)
Date: 2008-01-27 19:16

Message:
Logged In: YES 
user_id=1806226
Originator: NO

Is a config setting per r4944: config.socket_timeout

----------------------------------------------------------------------

Comment By: Filnik (filnik)
Date: 2008-01-27 15:59

Message:
Logged In: YES 
user_id=1834469
Originator: YES

Seems that also my scripts are working correctly. Bug closed (thanks to
all :-))

----------------------------------------------------------------------

Comment By: Russell Blau (russblau)
Date: 2008-01-27 13:31

Message:
Logged In: YES 
user_id=855050
Originator: NO

OK to close.  I ran a lengthy script on my home machine that has had
timeout problems in the past, and it worked fine.

----------------------------------------------------------------------

Comment By: Filnik (filnik)
Date: 2008-01-25 14:54

Message:
Logged In: YES 
user_id=1834469
Originator: YES

Ok, thanks russblau, should I close the topic or you aren't sure at 100%
that it has been fixed? :-) Bye Filnik

----------------------------------------------------------------------

Comment By: Russell Blau (russblau)
Date: 2008-01-24 23:41

Message:
Logged In: YES 
user_id=855050
Originator: NO

Sorry, that last comment was me, and the revision was r4936

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-01-24 23:37

Message:
Logged In: NO 

Added a 120-second timeout in r4796; seems to work in initial testing.

The problem with libcurl suggestion is that it would require every user of
every bot to download and install one or more third-party packages.


----------------------------------------------------------------------

Comment By: Francesco Cosoleto (cosoleto)
Date: 2008-01-24 18:21

Message:
Logged In: YES 
user_id=181280
Originator: NO

I am not sure PyWikipediaBot cause intensive cpu usage in Toolserver due
to this problem, anyway to fix temporary the no timeout problem seems there
is this easy solution:

import socket
socket.setdefaulttimeout(0.1)
urllib2.urlopen("http://cosoleto.free.fr").read()
[...]
urllib2.URLError: <urlopen error timed out>
urllib.urlopen("http://cosoleto.free.fr").read()
[...]
IOError: [Errno socket error] timed out

But I suggest libcurl (http://curl.haxx.se/libcurl/) to improve easily and
simplify the net side of the PyWikipedia code. libcurl is a feature rich
(persistant connections, trasparent compression support, etc...) and
portable URL transfer library written in C. Why not?

----------------------------------------------------------------------

Comment By: Bryan (btongminh)
Date: 2008-01-24 17:06

Message:
Logged In: YES 
user_id=1806226
Originator: NO

Note that it is much easier to do settimeout if persistent_http was
working. Unfortunately, it is not. I disabled it some time ago
(http://fisheye.ts.wikimedia.org/browse/pywikipedia/trunk/pywikipedia/wikipedia.py?r1=4638&r2=4641)
saying it needs investigation. Anybody here who is having to do this
investigation? It would not only solve Filnik's bug
(site.conn.sock.settimeout), but it would also greatly improve performance
for single threaded bots.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1878986&group_id=93107



More information about the Pywikipedia-l mailing list