On 9/7/05, Tim Starling t.starling@physics.unimelb.edu.au wrote:
Jeremy Dunck wrote:
According to the hardware orders page[1], a major bottleneck is page rendering.
See http://meta.wikimedia.org/wiki/Profiling/20050822. I don't think I'd call any of it a bottleneck, optimisation aimed at reducing average page view time or CPU load is hard work. I suspect there are hotspots, but not in our code, in the PHP VM. I think the best way to reduce page view time at this stage would be to optimise that, or to produce a JIT compiler.
I've often wondered this, so this is a great opportunity to jump in. Why not cache prerendered versions of all pages? It would seem that the majority of hits are reads. One approach I've seen elsewhere is to cache a page the first time it's loaded, and then have writes invalidate the cache. (That way you're not caching pages nobody looks at.)
It seems the total content of wikipedia isn't that big.
One tricky part is that writes to page B can affect page A if page A has a link to B. A reverse index of links would solve this, though I don't know how bit it'd be.
(PS: Sorry to be a back seat driver -- I know how frustrating it is when someone else tries to tell you they know your problem space better than you do. So if this is a stupid suggestion, feel free to tell me so; I'm mostly interested in the why/why not answer.)