Here an interesting alternative implementation for MediaWiki/Wikipedia:
* http://armstrongonsoftware.blogspot.com/2008/06/itching-my-programming-nerve...
* http://video.google.com/videoplay?docid=6981137233069932108 (Wikipedia discussion starts 30min into the video)
Basically a p2p backend that claims order of magnitude performance gains for writing pages. They ignore the front caches etc. Done in Erlang (+Java).
I was trying to figure out whether this would really be feature parity but couldn't fully see it.
For the rendering, they use plog4u---does someone know whether this has feature parity with Mediawiki (markup)? We used JAMWiki (Java implementation of MediaWiki) only to see later that there was no ParserFunctions extension available. (Why is this an extension rather than a core part in the first place?)
Thanks! Dirk
On Tue, Jul 22, 2008 at 7:03 PM, Dirk Riehle dirk@riehle.org wrote:
Here an interesting alternative implementation for MediaWiki/Wikipedia:
http://armstrongonsoftware.blogspot.com/2008/06/itching-my-programming-nerve...
discussion starts 30min into the video)
Basically a p2p backend that claims order of magnitude performance gains for writing pages. They ignore the front caches etc. Done in Erlang (+Java).
I was trying to figure out whether this would really be feature parity but couldn't fully see it.
For the rendering, they use plog4u---does someone know whether this has feature parity with Mediawiki (markup)? We used JAMWiki (Java implementation of MediaWiki) only to see later that there was no ParserFunctions extension available. (Why is this an extension rather than a core part in the first place?)
Thanks! Dirk
The slides for the talk are on the OnScale site < http://www.onscale.de/Reinefeld_Erlang_Exchange.pdf%3E, although I don't see an actual comparison in performance between the distributed architecture and the current Wikipedia setup.
He seems to ignore not only Squid, but also the key-value store MediaWiki is already well-integrated with: memcached. I think he's talking about something more complex (I only understand parts of it), but I don't think Wikipedia is much of a big dumb behemoth as far as architecture goes; I've always thought of it as the opposite, the lean model of incredible performance on an incredibly small budget.
Anyway, he also seems to be assuming that the scalability bottleneck is all in the 2000/s write requests, rather than the 48000/s read requests. Is this actually the case? On the server roles page < https://wikitech.leuksman.com/view/Server_roles%3E I see 10 database servers and hundreds of Apaches/Squids, so I'm dubious.
I'd be curious to hear what Brion or another Wikimedia engineer has to say about this, if he has time.
Benjamin Lees wrote:
The slides for the talk are on the OnScale site < http://www.onscale.de/Reinefeld_Erlang_Exchange.pdf%3E, although I don't see an actual comparison in performance between the distributed architecture and the current Wikipedia setup.
He seems to ignore not only Squid, but also the key-value store MediaWiki is already well-integrated with: memcached. I think he's talking about something more complex (I only understand parts of it), but I don't think Wikipedia is much of a big dumb behemoth as far as architecture goes; I've always thought of it as the opposite, the lean model of incredible performance on an incredibly small budget.
Anyway, he also seems to be assuming that the scalability bottleneck is all in the 2000/s write requests, rather than the 48000/s read requests. Is this actually the case? On the server roles page < https://wikitech.leuksman.com/view/Server_roles%3E I see 10 database servers and hundreds of Apaches/Squids, so I'm dubious.
I think this focus is the point. He ignores the caches because he is most interested in the database performance what happens during and after a write and how to scale that. All the squids and memcached should work with their architecture as well.
From a pragmatic perspective lots of other stuff is missing. I.e. they exclusively use a DHT (key value pairs) for access not full blown SQL (though presumably this could be added).
This is a research project, but if their numbers are right, they are an order of magnitude faster and leaner. Organizational and legal implications aside, a p2p architecture like the Internet itself is really what you would want for a next generation MediaWiki.
Now, all I actually wanted to know was how complete plog4u is for rendering MediaWiki syntax. I guess I shouldn't let my thought wander so much.
Dirk
PS: Would wikitech-l have been a better list to ask this question?
Dirk,
Thanks for reviving back the topic that wasn't touched for a while.
I can come up with scaling architecture, that would write fastest (e.g. appending to a text file can happen at hundreds of megabytes a second).
The problem is that reading that text file may be difficult afterwards :) Indeed databases provide performance bottlenecks, unless they don't. See, modeling for 'how much of transactions can we do' isn't the only part of site engineering - responsiveness, consistency, etc - is another issue.
So, for now we have the task not to scale out writes, but to scale reads (and read functionality) and maintain writes :)
P2P designs work great for isolated data, our data is very interdependent (media, templates, links, categories, etc). It is difficult to establish data clustering easily, as there're multiple views from multiple directions. Now, once the P2P architecture has to maintain all that, I'd like to see what performs better in reasonable scaling requirements...
This is a research project, but if their numbers are right, they are an order of magnitude faster and leaner. Organizational and legal implications aside, a p2p architecture like the Internet itself is really what you would want for a next generation MediaWiki.
There're far more implications, simply, maintainability, extensibility, etc. However p2p internet is, it still has backbones and datacenters, and depending on very expensive hardware :)
And at macro scale, we're already 'p2p' - with multiple languages on multiple database clusters contributing to 'cloud' of knowledge. ;-)
.. and a bit of skepticism aside, we can always add or change additional data backends to optimize for our scaling needs. It doesn't have to have whole application rewritten.
Domas, thanks for your insights!
There is a fair amount of work on putting p2p architectures under wiki engines but this work was the first to gain broader recognition, i.e. win a prize at the IEEE Scale 2008 conference. So I'm assuming the work is technically sound, even if it may not consider all the various aspects of a real application. I asked one of the original authors to comment on which of the issues you mention won't work well with their architecture or whether they could easily be tacked on. Lets see whether they'll show up.
Cheers, Dirk
Domas Mituzas wrote:
Dirk,
Thanks for reviving back the topic that wasn't touched for a while.
I can come up with scaling architecture, that would write fastest (e.g. appending to a text file can happen at hundreds of megabytes a second).
The problem is that reading that text file may be difficult afterwards :) Indeed databases provide performance bottlenecks, unless they don't. See, modeling for 'how much of transactions can we do' isn't the only part of site engineering - responsiveness, consistency, etc - is another issue.
So, for now we have the task not to scale out writes, but to scale reads (and read functionality) and maintain writes :)
P2P designs work great for isolated data, our data is very interdependent (media, templates, links, categories, etc). It is difficult to establish data clustering easily, as there're multiple views from multiple directions. Now, once the P2P architecture has to maintain all that, I'd like to see what performs better in reasonable scaling requirements...
This is a research project, but if their numbers are right, they are an order of magnitude faster and leaner. Organizational and legal implications aside, a p2p architecture like the Internet itself is really what you would want for a next generation MediaWiki.
There're far more implications, simply, maintainability, extensibility, etc. However p2p internet is, it still has backbones and datacenters, and depending on very expensive hardware :)
And at macro scale, we're already 'p2p' - with multiple languages on multiple database clusters contributing to 'cloud' of knowledge. ;-)
Hi all,
I am one of the co-authors of the IEEE Scale paper.
On Wednesday 23 July 2008, Dirk Riehle wrote:
Domas, thanks for your insights!
There is a fair amount of work on putting p2p architectures under wiki engines but this work was the first to gain broader recognition, i.e. win a prize at the IEEE Scale 2008 conference. So I'm assuming the work is technically sound, even if it may not consider all the various aspects of a real application. I asked one of the original authors to comment on which of the issues you mention won't work well with their architecture or whether they could easily be tacked on. Lets see whether they'll show up.
So, for now we have the task not to scale out writes, but to scale reads (and read functionality) and maintain writes :)
According to the wikipedia statistics 95% of the request are handled by the squids. And scaling out the squids is not a hard problem. That is why we only looked at the render farm and the databases.
As I understand the MySQL setup, you are running on a replicated MySQL database. Read requests can be answered by any replica and write requests go to all replicas. -> Adding more nodes does not increase your write capacity.
In our setup the replication degree is fixed. Every item is stored k times, no matter how many nodes you are using. So the write capacity is increasing with the number of database nodes.
P2P designs work great for isolated data, our data is very interdependent (media, templates, links, categories, etc). It is difficult to establish data clustering easily, as there're multiple views from multiple directions. Now, once the P2P architecture has to maintain all that, I'd like to see what performs better in reasonable scaling requirements...
That is indeed a problem. Our store only supports key-value pairs. You can see it as a large map/dictionary. We basically denormalized the SQL scheme.
So we have one map for mapping title names to their content (list of versions): "title name" -> [page_content] Another map stores the pages belonging to a category: "category name" -> [title names] You can add most features in this way.
When you update a page you have to update several of these maps. But that is what the transactions are for.
This is a research project, but if their numbers are right, they are an order of magnitude faster and leaner. Organizational and legal implications aside, a p2p architecture like the Internet itself is really what you would want for a next generation MediaWiki.
We looked at several scenarios here. You could run a p2p system within your data center because it scales better, is easier to maintain, etc. You could run one p2p overlay over several datacenters (here: Florida, Amsterdam, South Korea). Then you have to take care of data placement and network partitioning. Or you could run the p2p overlay over the users' pcs. But then you run into trust issues.
Thorsten
Hello,
I am one of the co-authors of the IEEE Scale paper.
Nice work! My hats are off :)
According to the wikipedia statistics 95% of the request are handled by the squids. And scaling out the squids is not a hard problem. That is why we only looked at the render farm and the databases.
Thats for anonymous users. Logged in users still hit the cluster.
Actually, 0.2% requests hitting the backend are saves (surprise!!!!) There's lots of other stuff done, like previews, searches, watchlists, actual browsing, various meta stuff, etc.
As I understand the MySQL setup, you are running on a replicated MySQL database. Read requests can be answered by any replica and write requests go to all replicas. -> Adding more nodes does not increase your write capacity.
Thats exactly right. We definitely have scaling bottleneck there. Our easiest way to scale writes is splitting off languages, and it will take quite some time to hit troubles on any non-English language (and English is doing fine at the moment too, with relatively low-grade database hardware). Actually we used to hit scaling bottleneck once upon a time, back when we were saving revision texts into core database (and actually main tables, in pre-1.5 times). It was a single-day (or 15-minute, to add some dramatic effect) hack to move that stuff out of core databases.
In our setup the replication degree is fixed. Every item is stored k times, no matter how many nodes you are using. So the write capacity is increasing with the number of database nodes.
Thats indeed nice for anything what requires simple key-value storage (and we definitely have such components, such as revision text storage, or various other simple metadata).
When you update a page you have to update several of these maps. But that is what the transactions are for.
Well, in this case, we end up with plentiful of maps:
Pages by unique ID, pages by name and title, pages by random value (ha!), pages by length (just in cases). Every page then has multiple revisions, which are saved by page, by id, by timestamp, by timestamp-per-page, by timestamp-per-user, and by timestamp-per-usertext (for anons, mostly). Every page then links to other pages, and is linked from other pages. Every page then is embedded as a template somewhere else, or embeds templates. Every page is in a category, or is an actual category. Every page has broken links, that are tracked too. Every page has images that have to be tracked. Every page has external links Every page has ...
And the biggest problem is, that for every of these maps, there're range scans or multiple reference reads. This leads to reading from 100 nodes (unless lots of clever data clustering is employed) for every read done. Of course, complexity of writes, when you don't go after infinite scaling, is much bigger too.
We looked at several scenarios here. You could run a p2p system within your data center because it scales better, is easier to maintain, etc. You could run one p2p overlay over several datacenters (here: Florida, Amsterdam, South Korea). Then you have to take care of data placement and network partitioning. Or you could run the p2p overlay over the users' pcs. But then you run into trust issues.
Nah, user's PCs are out of question. We'd really want to see nice scalable stores for some of our data, which works great with key-value stores. As well, we can probably offload some of biggest maps to somewhere 'out there'.
Another problem is that servers die in batches. It is much easier to put one database server on one power feed and another database server on another, or place them in separate datacenters. Once you have 1000 nodes that have to have HA characteristics as a whole, the lack of understanding which data goes where (think, availability zones), can lead to data lost, either temporary or permanently.
Of course, it is matter of engineering, but many of such problems are not resolved at research state.
Dirk Riehle wrote:
Here an interesting alternative implementation for MediaWiki/Wikipedia:
http://armstrongonsoftware.blogspot.com/2008/06/itching-my-programming-nerve...
discussion starts 30min into the video)
Basically a p2p backend that claims order of magnitude performance gains for writing pages. They ignore the front caches etc. Done in Erlang (+Java).
I was trying to figure out whether this would really be feature parity but couldn't fully see it.
For the rendering, they use plog4u---does someone know whether this has feature parity with Mediawiki (markup)? We used JAMWiki (Java implementation of MediaWiki) only to see later that there was no ParserFunctions extension available. (Why is this an extension rather than a core part in the first place?)
If the only thing missing from JAMWiki was ParserFunctions, that would be very impressive. ParserFunctions is simple. And indeed, there's a lot of really impressive code in there, although it's easy to find edge cases that don't work the same way.
But I thought I'd better test its performance, before I got too excited and started integrating it into MediaWiki. It turns out that it's full of O(N^2) cases, which made my usual testing method using repeated text to measure loop performance rather difficult.
For example, for the test text str_repeat("'''b''' ", 1000), JAMWiki showed O(N^2) performance:
1000 iterations: 1148ms 2000 iterations: 3916ms 4000 iterations: 15320ms
For str_repeat("[http://a] ", 1000), it took so long that I gave up waiting. MediaWiki does either of these things in linear time, on the order of hundreds of microseconds per loop.
It's unfortunate that a modern parser generator for a supposedly fast language like Java can't match hand-optimised PHP for speed. It's not like we've set a high bar here.
-- Tim Starling
Tim Starling wrote:
If the only thing missing from JAMWiki was ParserFunctions, that would be very impressive. ParserFunctions is simple. And indeed, there's a lot of really impressive code in there, although it's easy to find edge cases that don't work the same way.
True; it was just one of the first things we ran into with basic rendering of Wikipedia pages.
For str_repeat("[http://a] ", 1000), it took so long that I gave up waiting. MediaWiki does either of these things in linear time, on the order of hundreds of microseconds per loop.
[...]
It's unfortunate that a modern parser generator for a supposedly fast language like Java can't match hand-optimised PHP for speed. It's not like we've set a high bar here.
I'm not sure about not having set a high bar... However, we can confirm the parser generator vs hand-optimized parser issue. You just showed that JFlex, the parser generator used by JAMWiki doesn't scale up nicely. We found the same for ANTLR, another parser generator for Java, which also doesn't perform as well as MediaWiki when run against stripped down pages (our parser parses Wiki Creole which on a stripped down level is equivalent to MediaWiki syntax) [1]. MediaWiki performed equally well or better; in general I think the advantage of parser generator is easier maintainability and clarity of the language (you can view the grammar as a domain-specific language for describing acceptable syntax), but not performance :-(
Thanks for your insights!
Dirk
[1] http://www.riehle.org/2008/07/19/a-grammar-for-standardized-wiki-markup/
mediawiki-l@lists.wikimedia.org