Hi - clearly, it'd be great if Wikipedia had better performance.
I looked at some of the "Database benchmarks" postings, but I don't see any analysis of what's causing the ACTUAL bottlenecks on the real system (with many users & full database). Has someone done that analysis?
I suspect you guys have considered far more options, but as a newcomer who's just read the source code documentation, maybe some of these ideas will be helpful:
1. Perhaps for simple reads of the current article (cur), you could completely skip using MySQL and use the filesystem instead. Simple encyclopedia articles could be simply stored in the filesystem, one article per file. To avoid the huge directory problem (which many filesystems don't handle well, though Reiser does), you could use the terminfo trick.. create subdirectories for the first, second, and maybe even the third characters. E.G., "Europe" is in "wiki/E/u/r/Europe.text". The existence of a file can be used as the link test. This may or may not be faster than MySQL, but it's probably faster: the OS developers have been optimizing file access for a very long time, and instead of having userspace<->kernel<->userspace interaction, it's userspace<->kernel interaction. You also completely avoid locking and other joyless issues.
2. The generation of HTML from the Wiki format could be cached, as has been discussed. It could also be sped up, e.g., by rewriting it in flex. I suspect it'd be easy to rewrite the translation of Wiki to HTML in flex and produce something quite fast. My "html2wikipedia" is written in flex - it's really fast and didn't take long to write. The real problem is, I suspect that isn't the bottleneck.
3. You could start sending out text ASAP, instead of batching it. Many browsers start displaying text as it's available, so to users it might _feel_ faster. Also, holding text in-memory may create memory pressure that forces more useful stuff out of memory.
Anyway, I don't know if these ideas are all that helpful, but I hope they are.