On 2/18/07, Guillaume Pierre gpierre@cs.vu.nl wrote:
As Gerard said, the Vrije Universiteit Amsterdam is working on distributed decentralized hosting of a wikipedia-like site. Our first results are summarized in an article available here: http://www.globule.org/publi/DWECWH_webist2007.html
The meat of the idea seems to be to use distributed hash tables to allow the main database to be moved onto multiple mostly-independent computers (i.e. break away from the inefficient MySQL replication/cluster model). This is absolutely something which should be done. Wikipedia's data model screams for the adoption of this solution.
I question the benefit of then allowing untrusted third parties to run the servers, though, because at the end of the paper you acknowledge that all the data is going to have to pass back through trusted parties anyway. I'm not convinced that there would be a significant cost savings to the introduction of untrusted third parties in this case. Once you've achieved an approximately linear scaling of the database servers, which the appropriate use of DHTs will do, it seems to me that the costs of downloading the data from untrusted third parties (doubling the bandwidth) and checking the signatures (eating up CPU) is going to be nearly as great as the cost of simply adding another database server.
Of course, I see why you're proposing it - allowing untrusted third parties to interact directly with the end-user would require end-users to install some sort of client software if they want to authenticate the content. But I really think that's the way you've gotta go if you're going to achieve a real cost savings (or cost distribution). Let the end-user software check the signatures.
Anthony