Benjamin Lees wrote:
The slides for the talk are on the OnScale site <
http://www.onscale.de/Reinefeld_Erlang_Exchange.pdf>gt;, although I don't see
an actual comparison in performance between the distributed architecture and
the current Wikipedia setup.
He seems to ignore not only Squid, but also the key-value store MediaWiki is
already well-integrated with: memcached. I think he's talking about
something more complex (I only understand parts of it), but I don't think
Wikipedia is much of a big dumb behemoth as far as architecture goes; I've
always thought of it as the opposite, the lean model of incredible
performance on an incredibly small budget.
Anyway, he also seems to be assuming that the scalability bottleneck is all
in the 2000/s write requests, rather than the 48000/s read requests. Is
this actually the case? On the server roles page <
https://wikitech.leuksman.com/view/Server_roles> I see 10 database servers
and hundreds of Apaches/Squids, so I'm dubious.
I think this focus is the point. He ignores the caches because he is
most interested in the database performance what happens during and
after a write and how to scale that. All the squids and memcached should
work with their architecture as well.
From a pragmatic perspective lots of other stuff is missing. I.e. they
exclusively use a DHT (key value pairs) for access not full blown SQL
(though presumably this could be added).
This is a research project, but if their numbers are right, they are an
order of magnitude faster and leaner. Organizational and legal
implications aside, a p2p architecture like the Internet itself is
really what you would want for a next generation MediaWiki.
Now, all I actually wanted to know was how complete plog4u is for
rendering MediaWiki syntax. I guess I shouldn't let my thought wander so
much.
Dirk
PS: Would wikitech-l have been a better list to ask this question?
--
Phone: + 1 (650) 215 3459, Web:
http://www.riehle.org