You were able to solve the problem by adding hardware.
that is a victory or a failure is a matter of opinion.
See, a community knows about new hardware more, because
they pay for it. They do not pay developers and therefore do not
know what software optimizations were introduced. And we did
solve many problems by introducing software optimizations.
So far, operation of Wikimedia cluster is a pure victory.
Yes, blindly hacking away, disabling useful functions
instead of analyzing where the bottlenecks were. It was very
embarrasing to watch. The growth numbers in 2002-2003 for
susning.nu (which still runs on a single server) show what
software optimization can do, should you have chosen that path.
What we have now is content produced by massive
collaboration with a small export of it via our caches
to the open world.
Sir, we have profiling information. We know where our bottlenecks are.
We deal with those bottlenecks. I know how to do software optimization
for websites, though. Remove as much magic as possible, use very simplistic
methods, ...We could serve whole website as HTML files from single
Pentium4 box at gigabit speeds. We could drop accessibility and use
lots of AJAX for all content loading and interaction. We could use MediaWiki
as application server instead of script. There are still lots of challenges ahead.
There are ways to do things better.
If you have any specific ideas on how to 'optimize' all that - feel free.
Join into #mediawiki or #wikimedia-tech channels, discuss about profiler
output with developers, brainstorm on how things can be done better.
I also have cases in my portfolio of 'growth numbers', but none of
them reached the scale of Wikipedia, neither in load, neither in variety
of users, neither in size of content. And we did have bigger fulltime
development/operations teams, budgets as well.