Message: 7 Date: Tue, 11 Mar 2003 12:19:36 -0800 (PST) From: Brion Vibber vibber@aludra.usc.edu Subject: Re: [Wikitech-l] Re: what's going on with wikipedia ? To: wikitech-l@wikipedia.org Reply-To: wikitech-l@wikipedia.org
On Tue, 11 Mar 2003, Lee Daniel Crocker wrote:
It appears we're being Googled this morning. Googlebot is very well-behaved, and I'm not sure if that's the problem or not, but
Googlebot is fairly light (several seconds to 30 seconds between requests, and follows our robots.txt restrictions - its getting articles, not millions of diffs or contribs pages. It's only a fraction of total pages being served). I have written them an e-mail asking if it's possible to restrict the spidering to off-peak hours, though.
-- brion vibber (brion @ pobox.com)
Well Brian, off-peak hours is a bit of problem with an international website, isn't it? When germany goes to lunch (12:00 CET - Central European Time), the people in San Francisco come home from the bar (3:00 AM PST - Pacific Standard Time). So I think google cannot really do anything about, except treating every sub-domain according to it's timezone. (otherwise people in europe will ALWAYS have a slow wikipedia, because google thinks that is off-peak time). Another idea which might or might not work is the Apache Module mod_throttle http://www.snert.com/Software/mod_throttle/ You could give a general minimum idle time between requests or you could give penalties to db-heavy documents. But of course, this would make things still slower to some, but at least the server will take the load without coming close to a crash.
Cheers Leo
wikitech-l@lists.wikimedia.org