I've been getting a lot of timeouts lately. A fair chunk of connections are from the 'grub' distributed search engine spider, which connects too often and doesn't play with robots.txt as nicely as I'd like. Some time ago I'd put them in the 403 rejection list but, they don't get the hint and keep on trying to connect.
This doesn't touch the database, but does eat up some apache connections. Although I have now explicitly banned grub in robots.txt and filled out their little update form, they're still connecting -- even to the banned- for-all /w subdirectory. It's pissing me off.
I've upped the max connections on apache from 175 to 260, and on mysql from 400 to 560. Lee or Magnus; if we get mysql too-many-connections errors, either bring down the apache limit or move up the mysql limit.
-- brion vibber (brion @ pobox.com)
On Wed, Apr 23, 2003 at 07:58:27AM -0700, Brion Vibber wrote:
I've been getting a lot of timeouts lately. A fair chunk of connections are from the 'grub' distributed search engine spider, which connects too often and doesn't play with robots.txt as nicely as I'd like. Some time ago I'd put them in the 403 rejection list but, they don't get the hint and keep on trying to connect.
This doesn't touch the database, but does eat up some apache connections. Although I have now explicitly banned grub in robots.txt and filled out their little update form, they're still connecting -- even to the banned- for-all /w subdirectory. It's pissing me off.
I've upped the max connections on apache from 175 to 260, and on mysql from 400 to 560. Lee or Magnus; if we get mysql too-many-connections errors, either bring down the apache limit or move up the mysql limit.
Perhaps it would be best to put in an iptables rule to deny a TCP/IP connection from their crawler (assuming it comes from a reasonably limited set of IPs)?
On Wed, 23 Apr 2003, Nick Reinking wrote:
Perhaps it would be best to put in an iptables rule to deny a TCP/IP connection from their crawler (assuming it comes from a reasonably limited set of IPs)?
No, see, it's a *distributed* crawler. So I don't know where the fuck they're coming from. Like I said, it's really pissing me off.
See also: http://www.grub.org/forums/viewtopic.php?t=124
-- brion vibber (brion @ pobox.com)
I hadn't heard about this Grub project, but I'm not impressed. I'd be impressed if they made the resulting data available via a free license.
--Jimbo
On Wed, Apr 23, 2003 at 08:32:49AM -0700, Jimmy Wales wrote:
I hadn't heard about this Grub project, but I'm not impressed. I'd be impressed if they made the resulting data available via a free license.
--Jimbo
I agree. I'm not impressed either way - the attitude of the developers is, quite frankly, obnoxious.
I hadn't heard about this Grub project, but I'm not impressed. I'd be impressed if they made the resulting data available via a free license.
--Jimbo
I agree. I'm not impressed either way - the attitude of the developers is, quite frankly, obnoxious.
They've gotten plenty of press lately (Slashdot, New Scientist, et al.) but I have to say I'm not terribly impressed with their operation either. Basically, they've got an interesting idea, but a lousy execution, and not muc of a clear vision.
Maybe submitting a Slashdot story about their problems would be a kick in the pants?
Kurt Jansson wrote:
Lee Daniel Crocker schrieb:
Maybe submitting a Slashdot story about their problems would be a kick in the pants?
Or telling them that we're regarding it as a DDOS attac and sue them if they don't stop? (Just threaten to do it, of course.)
Worse than that, we'll write a NPOV article about their problems and post it in the most authoritative encyclopedia on the net, Wikipedia. Heh, that'll get 'em.
--Jimbo
wikitech-l@lists.wikimedia.org