[WikiEN-l] Citizendium vs. Wikipedia

Gwern Branwen gwern0 at gmail.com
Thu Apr 23 17:12:37 UTC 2009


On Thu, Apr 23, 2009 at 10:28 AM, Thomas Dalton <thomas.dalton at gmail.com> wrote:
> 2009/4/23 David Gerard <dgerard at gmail.com>:
>> 2009/4/23 Anthony <wikimail at inbox.org>:
>>
>>> I'll let you use p2pedia.org.  :)
>>
>>
>> Suggestion: Distributed git-based backed for MediaWiki.
>>
>> Usefulness: encouraging forks *and merges*. Now *that* could kick
>> Wikipedia's arse in useful and productive ways.
>
> I recall this being discussed before somewhere (mediawiki-l?). It's an
> interesting idea, but I don't know enough about git to know if it
> could actually be made to work (it would need something better than
> our current edit conflict system, for a start).

You're right that it's been discussed before, but hits are hard to
find. eg http://www.foo.be/cgi-bin/wiki.pl/2007-11-10_Dreaming_Of_Mediawiki_Using_GIT

Git would certainly do better than our current edit conflict system;
resolving such conflicts is precisely the point of smart DVCS systems.
(And it'd make it a lot easier to get dumps and work offline.)

The issue, of course, is performance. The English Wikipedia history
according to http://download.wikimedia.org/enwiki/latest/ is 147.7
gigabytes. Compressed. Now, Git is known for its speed and general
efficiency, but even it can't cope with that. It might barely be
possible for a single local installation to profitably use Git, but I
can't see the actual servers, taking hundreds and thousands of edits a
minute, working. Even alternative suggestions like 'make every article
an individual git repo' are problematic. And of course any such
conversion would be a *massive* programming challenge, to go from
Mysql interfacing to Git.

As it happens, I've thought about this before and have a little
expertise in the issue. I'm one of the developers of a wiki called
Gitit - http://github.com/jgm/gitit/tree/master - written in Haskell.
The most interesting thing about Gitit, besides its ability to export
articles (written in Markdown or ReST) in various formats such as HTML
or PDFs or LaTeX, is that it uses a library called 'filestore'  -
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/filestore -
to access and change articles.

Filestore is an abstraction over Git and Darcs (and a half-finished
Sqlite3), and basically follows the ikiwki model which is what people
think of when they say things like 'I wish my wiki used a DVCS instead
of a database' - each article is a file which is tracked by the
repository, and the wiki is actually a web front end to the repo. You
can 'git clone' it or whatever, but otherwise it acts like a regular
wiki.

Performance-wise filestore has been interesting. It exposed a
performance issue in Darcs which we (Darcs) fixed, and shown that
calling binaries to do things on-disk for you isn't all that expensive
- I believe on a regular system with a git backend, Gitit can do ~100
page views and edits a second. But it's not at all obvious how things
could get much faster than that. So my conclusion is that for very
large wikis, DVCS bases may never be competitive performance-wise;
although small and medium wikis (particularly ones aimed at
developers) probably can benefit from such an approach.

-- 
gwern



More information about the WikiEN-l mailing list