On Tue, Jul 22, 2008 at 7:03 PM, Dirk Riehle dirk@riehle.org wrote:
Here an interesting alternative implementation for MediaWiki/Wikipedia:
http://armstrongonsoftware.blogspot.com/2008/06/itching-my-programming-nerve...
discussion starts 30min into the video)
Basically a p2p backend that claims order of magnitude performance gains for writing pages. They ignore the front caches etc. Done in Erlang (+Java).
I was trying to figure out whether this would really be feature parity but couldn't fully see it.
For the rendering, they use plog4u---does someone know whether this has feature parity with Mediawiki (markup)? We used JAMWiki (Java implementation of MediaWiki) only to see later that there was no ParserFunctions extension available. (Why is this an extension rather than a core part in the first place?)
Thanks! Dirk
The slides for the talk are on the OnScale site < http://www.onscale.de/Reinefeld_Erlang_Exchange.pdf%3E, although I don't see an actual comparison in performance between the distributed architecture and the current Wikipedia setup.
He seems to ignore not only Squid, but also the key-value store MediaWiki is already well-integrated with: memcached. I think he's talking about something more complex (I only understand parts of it), but I don't think Wikipedia is much of a big dumb behemoth as far as architecture goes; I've always thought of it as the opposite, the lean model of incredible performance on an incredibly small budget.
Anyway, he also seems to be assuming that the scalability bottleneck is all in the 2000/s write requests, rather than the 48000/s read requests. Is this actually the case? On the server roles page < https://wikitech.leuksman.com/view/Server_roles%3E I see 10 database servers and hundreds of Apaches/Squids, so I'm dubious.
I'd be curious to hear what Brion or another Wikimedia engineer has to say about this, if he has time.