[Labs-l] Some using a Python framework is relentlessly hammering Harvard sites, resulting an IP range ban.

Merlijn van Deen (valhallasw) valhallasw at arctus.nl
Sun Dec 4 17:43:31 UTC 2016


Hi Martin,

On 4 December 2016 at 18:29, Martin Urbanec <martin.urbanec at wikimedia.cz>
wrote:

> I was running weblinkchecker.py for whole cswiki (job was submited to the
> grid at Sun, 20 Nov 2016 16:54:24 GMT) because I wished to have a list of
> deadlinks. This may correspond with the UA (because I used script named
> weblinkschecker.py). I trusted this script it won't do anything wrong
> because this script was and still is in standard core package. I also use
> 3.0-dev version of pywikibot and Python 2.7.6.
>
>
It probably wasn't you, but it was indeed the standard weblinkchecker
causing this. Apparently no throttling is implemented -- just a maximum
number of parallel connections. Many parallel connections are fine... but
not to the same host. This bot was running on eswiki, and ewsiki has
thousands of links to http://www.minorplanetcenter.net/.

I have contacted the user, and will file a bug for Pywikibot to get this
solved on that end.

Best,
Merlijn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20161204/762e9017/attachment.html>


More information about the Labs-l mailing list