On Thu, Apr 23, 2009 at 10:28 AM, Thomas Dalton thomas.dalton@gmail.com wrote:
2009/4/23 David Gerard dgerard@gmail.com:
2009/4/23 Anthony wikimail@inbox.org:
I'll let you use p2pedia.org. :)
Suggestion: Distributed git-based backed for MediaWiki.
Usefulness: encouraging forks *and merges*. Now *that* could kick Wikipedia's arse in useful and productive ways.
I recall this being discussed before somewhere (mediawiki-l?). It's an interesting idea, but I don't know enough about git to know if it could actually be made to work (it would need something better than our current edit conflict system, for a start).
You're right that it's been discussed before, but hits are hard to find. eg http://www.foo.be/cgi-bin/wiki.pl/2007-11-10_Dreaming_Of_Mediawiki_Using_GIT
Git would certainly do better than our current edit conflict system; resolving such conflicts is precisely the point of smart DVCS systems. (And it'd make it a lot easier to get dumps and work offline.)
The issue, of course, is performance. The English Wikipedia history according to http://download.wikimedia.org/enwiki/latest/ is 147.7 gigabytes. Compressed. Now, Git is known for its speed and general efficiency, but even it can't cope with that. It might barely be possible for a single local installation to profitably use Git, but I can't see the actual servers, taking hundreds and thousands of edits a minute, working. Even alternative suggestions like 'make every article an individual git repo' are problematic. And of course any such conversion would be a *massive* programming challenge, to go from Mysql interfacing to Git.
As it happens, I've thought about this before and have a little expertise in the issue. I'm one of the developers of a wiki called Gitit - http://github.com/jgm/gitit/tree/master - written in Haskell. The most interesting thing about Gitit, besides its ability to export articles (written in Markdown or ReST) in various formats such as HTML or PDFs or LaTeX, is that it uses a library called 'filestore' - http://hackage.haskell.org/cgi-bin/hackage-scripts/package/filestore - to access and change articles.
Filestore is an abstraction over Git and Darcs (and a half-finished Sqlite3), and basically follows the ikiwki model which is what people think of when they say things like 'I wish my wiki used a DVCS instead of a database' - each article is a file which is tracked by the repository, and the wiki is actually a web front end to the repo. You can 'git clone' it or whatever, but otherwise it acts like a regular wiki.
Performance-wise filestore has been interesting. It exposed a performance issue in Darcs which we (Darcs) fixed, and shown that calling binaries to do things on-disk for you isn't all that expensive - I believe on a regular system with a git backend, Gitit can do ~100 page views and edits a second. But it's not at all obvious how things could get much faster than that. So my conclusion is that for very large wikis, DVCS bases may never be competitive performance-wise; although small and medium wikis (particularly ones aimed at developers) probably can benefit from such an approach.