[Wikipedia-l] Web spiders vs Wikipedia
Brion VIBBER
brion at pobox.com
Fri Nov 1 00:24:03 UTC 2002
I noticed this afternoon that something at IP 144.167.21.15 was
spidering the site, loading up thousands of page diffs, user
contributions pages, and other slow things at a rate of several per
second -- apparently as fast as it could get them in. I blocked its IP
from access to the /w/ directory (so it can only access regular pages
and default-view special pages via the / and /wiki/Foo paths; I put a
general prohibition into robots.txt as well), and the server load has
gone *dramatically* down.
It appears to be someone running 'WebStripper' trying to copy the whole
site; either it doesn't have sane throttling controls or they've
disabled it.
The IP is an unnamed host belonging to University of Arkansas at Little
Rock; probably some college kid enjoying the wonders of uni network
bandwidth.
-- brion vibber (brion @ pobox.com)
More information about the Wikipedia-l
mailing list