[Pywikipedia-l] [ pywikipediabot-Bugs-1878986 ] getUrl() has a problem. No timeout?

24 Jan 2008


      Bugs item #1878986, was opened at 2008-01-24 07:59
Message generated for change (Comment added) made by nobody
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1878986...
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Filnik (filnik)
Assigned to: Nobody/Anonymous (nobody)
Summary: getUrl() has a problem. No timeout?
Initial Comment:
Hello, I've seen that in my processes there are some scripts that are started something like 1-2 weeks ago that are still running.
The problem is that the function getUrl() of wikipedia.py doesn't raise any error after x time (or, I suppose that's this the reason, otherwise we have a bot that is trying to get a page for 1 week without a specific reason...).
I've not fixed the Bug only because I've no idea how to fix it (I have never handle with HTTP connections directly on python) but Bryan has said:
<Bryan> yes, but that would require you to modify the socket settings
<Bryan> sock.settimeout(1500)
<Bryan> or you do select.select on the socket
<Bryan> which is very hard in pywiki
Some ideas? :-) The 1500 by the way is only a number, we should/can set it on config.py. I've set this bug with high priority because infinite loops on toolserver are really a big problem.
Thanks, Filnik
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2008-01-24 14:37
Message:
Logged In: NO
Added a 120-second timeout in r4796; seems to work in initial testing.
The problem with libcurl suggestion is that it would require every user of
every bot to download and install one or more third-party packages.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2008-01-24 09:21
Message:
Logged In: YES 
user_id=181280
Originator: NO
I am not sure PyWikipediaBot cause intensive cpu usage in Toolserver due
to this problem, anyway to fix temporary the no timeout problem seems there
is this easy solution:
import socket
socket.setdefaulttimeout(0.1)
urllib2.urlopen("http://cosoleto.free.fr%22).read()
[...]
urllib2.URLError: <urlopen error timed out>
urllib.urlopen("http://cosoleto.free.fr%22).read()
[...]
IOError: [Errno socket error] timed out
But I suggest libcurl (http://curl.haxx.se/libcurl/) to improve easily and
simplify the net side of the PyWikipedia code. libcurl is a feature rich
(persistant connections, trasparent compression support, etc...) and
portable URL transfer library written in C. Why not?
----------------------------------------------------------------------
Comment By: Bryan (btongminh)
Date: 2008-01-24 08:06
Message:
Logged In: YES 
user_id=1806226
Originator: NO
Note that it is much easier to do settimeout if persistent_http was
working. Unfortunately, it is not. I disabled it some time ago
(http://fisheye.ts.wikimedia.org/browse/pywikipedia/trunk/pywikipedia/wikiped...)
saying it needs investigation. Anybody here who is having to do this
investigation? It would not only solve Filnik's bug
(site.conn.sock.settimeout), but it would also greatly improve performance
for single threaded bots.
----------------------------------------------------------------------
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1878986...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Pywikipedia-l] [ pywikipediabot-Bugs-1878986 ] getUrl() has a problem. No timeout?