Domas, thanks for your insights!
There is a fair amount of work on putting p2p architectures under wiki
engines but this work was the first to gain broader recognition, i.e.
win a prize at the IEEE Scale 2008 conference. So I'm assuming the work
is technically sound, even if it may not consider all the various
aspects of a real application. I asked one of the original authors to
comment on which of the issues you mention won't work well with their
architecture or whether they could easily be tacked on. Lets see whether
they'll show up.
Cheers,
Dirk
Domas Mituzas wrote:
Dirk,
Thanks for reviving back the topic that wasn't touched for a while.
I can come up with scaling architecture, that
would write fastest
(e.g. appending to a text file can happen at hundreds of megabytes a
second).
The problem is that reading that text file may be difficult
afterwards :)
Indeed databases provide performance bottlenecks, unless they don't.
See, modeling for 'how much of transactions can we do' isn't the only
part of site engineering - responsiveness, consistency, etc - is
another issue.
So, for now we have the task not to scale out writes, but to scale
reads (and read functionality) and maintain writes :)
P2P designs work great for isolated data, our data is very
interdependent (media, templates, links, categories, etc). It is
difficult to establish data clustering easily, as there're multiple
views from multiple directions.
Now, once the P2P architecture has to maintain all that, I'd like to
see what performs better in reasonable scaling requirements...
This is a research project, but if their numbers
are right, they are
an
order of magnitude faster and leaner. Organizational and legal
implications aside, a p2p architecture like the Internet itself is
really what you would want for a next generation MediaWiki.
There're far more implications, simply, maintainability,
extensibility, etc.
However p2p internet is, it still has backbones and datacenters, and
depending on very expensive hardware :)
And at macro scale, we're already 'p2p' - with multiple languages on
multiple database clusters contributing to 'cloud' of knowledge. ;-)