On Tue, Jul 22, 2008 at 7:03 PM, Dirk Riehle <dirk(a)riehle.org> wrote:
Here an interesting alternative implementation for
MediaWiki/Wikipedia:
*
http://armstrongonsoftware.blogspot.com/2008/06/itching-my-programming-nerv…
*
http://video.google.com/videoplay?docid=6981137233069932108 (Wikipedia
discussion starts 30min into the video)
Basically a p2p backend that claims order of magnitude performance gains
for writing pages. They ignore the front caches etc. Done in Erlang
(+Java).
I was trying to figure out whether this would really be feature parity
but couldn't fully see it.
For the rendering, they use plog4u---does someone know whether this has
feature parity with Mediawiki (markup)? We used JAMWiki (Java
implementation of MediaWiki) only to see later that there was no
ParserFunctions extension available. (Why is this an extension rather
than a core part in the first place?)
Thanks!
Dirk
The slides for the talk are on the OnScale site <
http://www.onscale.de/Reinefeld_Erlang_Exchange.pdf>gt;, although I don't see
an actual comparison in performance between the distributed architecture and
the current Wikipedia setup.
He seems to ignore not only Squid, but also the key-value store MediaWiki is
already well-integrated with: memcached. I think he's talking about
something more complex (I only understand parts of it), but I don't think
Wikipedia is much of a big dumb behemoth as far as architecture goes; I've
always thought of it as the opposite, the lean model of incredible
performance on an incredibly small budget.
Anyway, he also seems to be assuming that the scalability bottleneck is all
in the 2000/s write requests, rather than the 48000/s read requests. Is
this actually the case? On the server roles page <
https://wikitech.leuksman.com/view/Server_roles> I see 10 database servers
and hundreds of Apaches/Squids, so I'm dubious.
I'd be curious to hear what Brion or another Wikimedia engineer has to say
about this, if he has time.