Message: 7
Date: Tue, 11 Mar 2003 12:19:36 -0800 (PST)
From: Brion Vibber <vibber(a)aludra.usc.edu>
Subject: Re: [Wikitech-l] Re: what's going on with wikipedia ?
To: wikitech-l(a)wikipedia.org
Reply-To: wikitech-l(a)wikipedia.org
On Tue, 11 Mar 2003, Lee Daniel Crocker wrote:
It appears we're being Googled this morning.
Googlebot is very
well-behaved, and I'm not sure if that's the problem or not, but
Googlebot is fairly light (several seconds to 30 seconds between requests,
and follows our robots.txt restrictions - its getting articles, not
millions of diffs or contribs pages. It's only a fraction of total pages
being served). I have written them an e-mail asking if it's possible to
restrict the spidering to off-peak hours, though.
-- brion vibber (brion @
pobox.com)
Well Brian, off-peak hours is a bit of problem with an international
website, isn't it?
When germany goes to lunch (12:00 CET - Central European Time), the
people in San Francisco come home from the bar (3:00 AM PST - Pacific
Standard Time).
So I think google cannot really do anything about, except treating every
sub-domain according to it's timezone. (otherwise people in europe will
ALWAYS have a slow wikipedia, because google thinks that is off-peak
time).
Another idea which might or might not work is the Apache Module
mod_throttle
http://www.snert.com/Software/mod_throttle/
You could give a general minimum idle time between requests or you could
give penalties to db-heavy documents. But of course, this would make
things still slower to some, but at least the server will take the load
without coming close to a crash.
Cheers
Leo