[Foundation-l] Possibility of a git-based fully distributed Wikipedia

gwern0 at gmail.com gwern0 at gmail.com
Mon Feb 18 20:12:59 UTC 2008


On 2008.02.18 17:56:46 +0000, Thomas Dalton <thomas.dalton at gmail.com> scribbled 0.2K characters:
> > Everybody have their own clone of the whole project, and maybe part of them.
>
> That's a lot of hard drive space! Is it practical to have a
> distributed approach to a project as large as enwiki?

Yes, but it puts in you a bit of dilemma. I see a few options forced on us by the torrent of changes in Recent Changes:

# Try to keep everybody's repositories up to date. The problem here is that I don't know how much a day's worth of Recent Changes would be, even compressed and stored as minimal diffs, but I suspect the bandwidth cost would be prohibitive.
# Accept that people will always be out of date. This isn't pleasant either because of how fast some WP processes like CSD move, and it unnecessarily increases conflicts.
# Try some more imaginative solution.
## Here I'm thinking about [[Lazy evaluation]] - why should everyone try to keep an entire backup of Wikipedia on their hard drives? No doubt it would do wonders for Wikipedia's disaster recovery preparedness, but it's unnecessary.
### Is anyone here familiar with the [[Filesystem in Userspace|FUSE]] filesystem [[WikipediaFS]]? Or [[wikifs]]? When I think about them, it strikes me that it would be quite neat if we had something similar to them where anytime you opened up an 'article' and the software creates the textfile with the source wikitext* but _also_ downloads the DVCS repository, then it could automatically record your changes as a patch and send them off. That way, you reap the benefits of a DVCS, without downloading all of Wikipedia, and it is done in a way as convenient as pointing your text editor at a particular file (which may be even easier than going to a website and using your browser's crippled editor).

What disadvantages does this have? Well, it's not truly decentralized; I can't see how to do it without a FUSE-alike (and FUSE is only for Macs and Linuxes, and so far as I know, definitely not for Windows); and probably a bit hard to grasp or too technical.

* How WikipediaFS works is that whenever your programs ask for a filename in a directory it controls, it quietly interprets that filename as the name of a Wikipedia article, downloads the wiki-sourcetext, and places it into a real file; if the file is modified, WikipediaFS uploads the modified text to Wikipedia. So it's done 'on demand', lazily.

--
gwern
Recce .375 Spetznaz Belknap radint Mafia Gorizont LABLINK Kyudanki server
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/foundation-l/attachments/20080218/24573acf/attachment.pgp 


More information about the foundation-l mailing list