On Sat, Nov 10, 2007 at 11:26:42PM +0100, Merlijn van Deen wrote:
Most importantly, I think we should stop storing
wikitext. Storing
wikitext makes it hard to make changes in the syntax, because it would
break pretty much every existing page. Wikitext is an ambiguous way of
storing 'the way it is meant'; XML is a clear way of doing this. As the
text is compressed, using wikitext or XML does not make that big of a
difference.
We did this one about 6 months ago, check the archives.
However, XML makes parsing much easier. Yes, it will
need two steps, but
when regenerating the page from the database, it's much easier (no ugly
regexps, just a simple SAX parser). Besides, as a pywikipedia developer,
I'd like to have XML output ;)
Sure, but we *still* need to regularize the parser before we can do
that.
To summarize: We should switch to storing a much more
descriptive format
so changes in the wikitext format do not break anything: the wikitext can
just be generated from the XML, in whichever format you want. This means
it should be able to use (cleaned up) mediawiki wikitext, wikicreole or
many other systems - per user. (Although as far as I can see wikicreole
isn't available as context free grammar either..)
I should note that it seems likely to become harder to calculate diffs if
we store the parse tree instead of the wikitext... but on this point
I'll be willing to admit I might be entirely off base.
Cheers,
-- jra
--
Jay R. Ashworth Baylink jra(a)baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates
http://baylink.pitas.com '87 e24
St Petersburg FL USA
http://photo.imageinc.us +1 727 647 1274