Lars Aronsson wrote:
This sounds like a nice theory, but what you need first
is the
numbers. There are just sooo many more (100 times? 1000 times?)
normal page views than diffs, history views or edits. And the
normal page views are taken care of by caching proxies (Squid)
already.
I don't know what the numbers are today, or what the
hit-miss-ratio of the Squid cache is. It would be interesting to
know. Are these statistics documented anywhere?
Page requests per day hasn't been documented after October 2004,
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm
The diff was just an example. If we could push the bots onto the pool,
that would be another example. I don't know if that would be helpful.
With the numbers, we need to consider server load. Squid is one method
to off-load the work a tier away from the server.
Anything that can be fetched from the database once, packaged together,
and passed to a node to be processed can be off-loaded. The diff seem
like an easy target to make an example out of it.
Jonathan