Domas, thanks for your insights!
There is a fair amount of work on putting p2p architectures under wiki engines but this work was the first to gain broader recognition, i.e. win a prize at the IEEE Scale 2008 conference. So I'm assuming the work is technically sound, even if it may not consider all the various aspects of a real application. I asked one of the original authors to comment on which of the issues you mention won't work well with their architecture or whether they could easily be tacked on. Lets see whether they'll show up.
Cheers, Dirk
Domas Mituzas wrote:
Dirk,
Thanks for reviving back the topic that wasn't touched for a while.
I can come up with scaling architecture, that would write fastest (e.g. appending to a text file can happen at hundreds of megabytes a second).
The problem is that reading that text file may be difficult afterwards :) Indeed databases provide performance bottlenecks, unless they don't. See, modeling for 'how much of transactions can we do' isn't the only part of site engineering - responsiveness, consistency, etc - is another issue.
So, for now we have the task not to scale out writes, but to scale reads (and read functionality) and maintain writes :)
P2P designs work great for isolated data, our data is very interdependent (media, templates, links, categories, etc). It is difficult to establish data clustering easily, as there're multiple views from multiple directions. Now, once the P2P architecture has to maintain all that, I'd like to see what performs better in reasonable scaling requirements...
This is a research project, but if their numbers are right, they are an order of magnitude faster and leaner. Organizational and legal implications aside, a p2p architecture like the Internet itself is really what you would want for a next generation MediaWiki.
There're far more implications, simply, maintainability, extensibility, etc. However p2p internet is, it still has backbones and datacenters, and depending on very expensive hardware :)
And at macro scale, we're already 'p2p' - with multiple languages on multiple database clusters contributing to 'cloud' of knowledge. ;-)