I've often wondered this, so this is a great opportunity to jump in. Why not cache prerendered versions of all pages? It would seem that the majority of hits are reads. One approach I've seen elsewhere is to cache a page the first time it's loaded, and then have writes invalidate the cache. (That way you're not caching pages nobody looks at.)
We have multiple caches. First of all, all pages are cached by squids, and achieve >75% hitrates for anonymous users. Cached objects are invalidated by HTCP CLR message sent via multicast to our global squid deployment.
Using squid caching also provides us easy way to bring lots of content closer to users, thus reducing page load times dramatically for anons (and a bit for logged in users).
If we get possibilities to deploy caches in Australia and China, that'd be awesome. Right now we're in search for Chinese and Australian locations though (kind of 3 or even 1 server deployments would save huge countries:)
We cannot cache pages for logged in users, as they can be different, though, at some time we might achieve that... Though now,
There is also parser cache, which caches documents for logged in users as well. We try to increase efficiency of that as well.
One tricky part is that writes to page B can affect page A if page A has a link to B. A reverse index of links would solve this, though I don't know how bit it'd be.
We have reverse index of links, and we use that for invalidating both parser cache and squid cache objects.
Domas
On 9/8/05, Domas Mituzas Domas.Mituzas@microlink.lt wrote:
Why not cache prerendered versions of all pages?
We have multiple caches. [explanation snipped]
Thanks for the explanation! And with all of that, rendering is still the bottleneck?
Evan Martin wrote:
On 9/8/05, Domas Mituzas Domas.Mituzas@microlink.lt wrote:
Why not cache prerendered versions of all pages?
We have multiple caches. [explanation snipped]
Thanks for the explanation! And with all of that, rendering is still the bottleneck?
Not really, Article::view is only 19% of profiling time, based on the figures I posted. As I was saying, I prefer to discourage the "bottleneck" mindset. An average request is very diffuse. It's meaningful to talk about the bottleneck of some slow functions, but there's no "bottleneck of MediaWiki".
To put it another way, in that profiling run, there were 17983 requests. 10556 (58%) of them were action=view. 8952 (49%) were for the current revision, the other 9% were diffs and old revisions. Of those 8952 requests, about half would have been parser cache hits (current hit ratio is 53%). So only about 23% of requests require the rendering of the current article text. And this is meant to be our bottleneck?
-- Tim Starling
wikitech-l@lists.wikimedia.org