Hi!
Much of recent development and administration has
focused on caching,
clustering, failover, load balancing, and so on.
This is more part of architecture, than of software used.
It seems to me that
the decision to use a ready-made application server like JBoss, Resin
or Zope (or WebSphere if you want proprietary) or a different database
server would make a greater difference for long term deployment of a
very large scale wiki farm like Wikimedia than the choice of a
particular programming language (though of course one may imply the
other).
I'm not aware of any huge-scale environments, that would be running
on Jboss, Resin or Zope (or Websphere).
Application servers with all the integration magic are required
mostly for complex applications that have hundreds of thousands of
developers ;)
Regarding databases - here again, architecture imposes what you use.
Right now our architecture consists of:
a) Small replicated core database sets (per-language)
b) Pools of replicated text storage nodes
c) Pool of in-memory hash lossy store nodes
Probably we might be adding a
d) Pool of fully clustered storage, for session objects.
What we can introduce - different storage paradigms for different
objects, and here we choose software that works and is easy to maintain.
That being said, from the rough numbers I've seen
about similarly
sized sites like eBay or
Amazon.com, which typically use such
application server architectures, we are running on a ridiculously
I'm not totally compatible with all enterprise software, but at least
Amazon is using pretty lightweight setup with most of stuff being
routed to 'services' around, SOAP, WSDL, yadda yadda. I'm not sure if
they need full-blown app server for this at the front.
Nevertheless, it seems clear that our "roll your
own" approach, while
more intensive in developer work, can save significantly on hardware.
It's also interesting to compare Flickr's technological evolution,
which is quite similar to our own:
http://www.ludicorp.com/flickr/zend-talk.ppt
I'm not sure it takes us longer to roll our own stuff, than it would
take to leverage all the other solutions.
Is similar information available about Yahoo!'s
setup?
They're quite similar to us, it is just that they have more
redundancy over multiple datacenters (and have practice of serving
from multiple datacenters at the same time too).
with new languages, and even MediaWiki itself comes
with an OCaml
extension. :-) It's certainly a rich learning environment.
Well, yes, we even have direct PHP extensions written in C++/C, and
there's various outside-of-mediawiki code in boo, python, C#, perl
too ;-)
--
Domas Mituzas --
http://dammit.lt/ -- [[user:midom]]