Hello,
You were able to solve the problem by adding hardware. Whether that is a victory or a failure is a matter of opinion.
See, a community knows about new hardware more, because they pay for it. They do not pay developers and therefore do not know what software optimizations were introduced. And we did solve many problems by introducing software optimizations. So far, operation of Wikimedia cluster is a pure victory.
Yes, blindly hacking away, disabling useful functions at random, instead of analyzing where the bottlenecks were. It was very embarrasing to watch. The growth numbers in 2002-2003 for susning.nu (which still runs on a single server) show what software optimization can do, should you have chosen that path.
What we have now is content produced by massive collaboration with a small export of it via our caches to the open world.
Sir, we have profiling information. We know where our bottlenecks are. We deal with those bottlenecks. I know how to do software optimization for websites, though. Remove as much magic as possible, use very simplistic methods, ...We could serve whole website as HTML files from single Pentium4 box at gigabit speeds. We could drop accessibility and use lots of AJAX for all content loading and interaction. We could use MediaWiki as application server instead of script. There are still lots of challenges ahead. There are ways to do things better.
If you have any specific ideas on how to 'optimize' all that - feel free. Join into #mediawiki or #wikimedia-tech channels, discuss about profiler output with developers, brainstorm on how things can be done better.
I also have cases in my portfolio of 'growth numbers', but none of them reached the scale of Wikipedia, neither in load, neither in variety of users, neither in size of content. And we did have bigger fulltime development/operations teams, budgets as well.
Cheers, Domas
wikimedia-l@lists.wikimedia.org