Hi!
Much of recent development and administration has focused on caching, clustering, failover, load balancing, and so on.
This is more part of architecture, than of software used.
It seems to me that the decision to use a ready-made application server like JBoss, Resin or Zope (or WebSphere if you want proprietary) or a different database server would make a greater difference for long term deployment of a very large scale wiki farm like Wikimedia than the choice of a particular programming language (though of course one may imply the other).
I'm not aware of any huge-scale environments, that would be running on Jboss, Resin or Zope (or Websphere). Application servers with all the integration magic are required mostly for complex applications that have hundreds of thousands of developers ;)
Regarding databases - here again, architecture imposes what you use. Right now our architecture consists of:
a) Small replicated core database sets (per-language) b) Pools of replicated text storage nodes c) Pool of in-memory hash lossy store nodes
Probably we might be adding a
d) Pool of fully clustered storage, for session objects.
What we can introduce - different storage paradigms for different objects, and here we choose software that works and is easy to maintain.
That being said, from the rough numbers I've seen about similarly sized sites like eBay or Amazon.com, which typically use such application server architectures, we are running on a ridiculously
I'm not totally compatible with all enterprise software, but at least Amazon is using pretty lightweight setup with most of stuff being routed to 'services' around, SOAP, WSDL, yadda yadda. I'm not sure if they need full-blown app server for this at the front.
Nevertheless, it seems clear that our "roll your own" approach, while more intensive in developer work, can save significantly on hardware. It's also interesting to compare Flickr's technological evolution, which is quite similar to our own: http://www.ludicorp.com/flickr/zend-talk.ppt
I'm not sure it takes us longer to roll our own stuff, than it would take to leverage all the other solutions.
Is similar information available about Yahoo!'s setup?
They're quite similar to us, it is just that they have more redundancy over multiple datacenters (and have practice of serving from multiple datacenters at the same time too).
with new languages, and even MediaWiki itself comes with an OCaml extension. :-) It's certainly a rich learning environment.
Well, yes, we even have direct PHP extensions written in C++/C, and there's various outside-of-mediawiki code in boo, python, C#, perl too ;-)