Re: [Mediawiki-l] Alternative implementations

24 Jul 2008

Hi all,

I am one of the co-authors of the IEEE Scale paper.

On Wednesday 23 July 2008, Dirk Riehle wrote:
...
  Domas, thanks for your insights!

 There is a fair amount of work on putting p2p architectures under wiki
 engines but this work was the first to gain broader recognition, i.e.
 win a prize at the IEEE Scale 2008 conference. So I'm assuming the work
 is technically sound, even if it may not consider all the various
 aspects of a real application. I asked one of the original authors to
 comment on which of the issues you mention won't work well with their
 architecture or whether they could easily be tacked on. Lets see whether
 they'll show up. 
...
  > So, for now we have the task not to scale out
writes, but to scale
 > reads (and read functionality) and maintain writes :) According to the
wikipedia statistics 95% of the request are handled by the 
squids. And scaling out the squids is not a hard problem. That is why we only 
looked at the render farm and the databases.

As I understand the MySQL setup, you are running on a replicated MySQL 
database. Read requests can be answered by any replica and write requests go 
to all replicas. -> Adding more nodes does not increase your write capacity.

In our setup the replication degree is fixed. Every item is stored k times, no 
matter how many nodes you are using. So the write capacity is increasing with 
the number of database nodes.

...
  >
 > P2P designs work great for isolated data, our data is very
 > interdependent (media, templates, links, categories, etc). It is
 > difficult to establish data clustering easily, as there're multiple
 > views from multiple directions.
 > Now, once the P2P architecture has to maintain all that, I'd like to
 > see what performs better in reasonable scaling requirements... That is indeed
a problem. Our store only supports key-value pairs. You can see 
it as a large map/dictionary. We basically denormalized the SQL scheme.

So we have one map for mapping title names to their content (list of 
versions):
"title name" -> [page_content]
Another map stores the pages belonging to a category:
"category name" -> [title names]
You can add most features in this way.

When you update a page you have to update several of these maps. But that is 
what the transactions are for.

...
  >> This is a research project, but if their
numbers are right, they are
 >> an
 >> order of magnitude faster and leaner. Organizational and legal
 >> implications aside, a p2p architecture like the Internet itself is
 >> really what you would want for a next generation MediaWiki. We looked at
several scenarios here. You could run a p2p system within your 
data center because it scales better, is easier to maintain, etc. You could 
run one p2p overlay over several datacenters (here: Florida, Amsterdam, South 
Korea). Then you have to take care of data placement and network 
partitioning. Or you could run the p2p overlay over the users' pcs. But then 
you run into trust issues.

Thorsten

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] Alternative implementations