On 11/11/07, Merlijn van Deen valhallasw@arctus.nl wrote:
Most importantly, I think we should stop storing wikitext. Storing wikitext makes it hard to make changes in the syntax, because it would break pretty much every existing page. Wikitext is an ambiguous way of storing 'the way it is meant'; XML is a clear way of doing this. As the text is compressed, using wikitext or XML does not make that big of a difference.
Interesting idea, and it does mean you can update the syntax whenever you want: however if you make a big change, then the amount of broken syntax that will be *written* will increase.
To change the format to XML (and updating the wikitext format at the same time) means we need four important things: an 'old wikitext'->XML converter, a XML->'good wikitext' converter, a 'good wikitext'->XML converter and a XML->HTML parser. (s/converter/parser, if you care about the exact words). The 'good wikitext' and html parsers should be fairly easy; the first is just plain hard.
I've only ever used one system that worked like that: LambdaMOO. When you write code in that system, it compiles it to bytecode, then decompiles it next time you want to edit it. It had some interesting quirks though:
- Whitespace was self-normalising (not a bad thing) - Parentheses were self-normalising (sometimes a confusing thing) - /* Comments */ were stripped out and not stored (a stupid thing) - You couldn't save non-compiling code
I find your suggestion of replacing one parser by 4 parsers a bit scary though. Admittedly one of those parsers (old wikitext -> XML) is not needed in the long term, and the XML->XHTML renderer would be pretty simple. But it does mean that every change to the grammar needs to be carefully implemented both in parsing and de-parsing.
I guess every test case would also involve a compulsory roundtrip. If it doesn't survive the roundtrip perfectly, it fails.
To summarize: We should switch to storing a much more descriptive format so changes in the wikitext format do not break anything: the wikitext can just be generated from the XML, in whichever format you want. This means it should be able to use (cleaned up) mediawiki wikitext, wikicreole or many other systems - per user. (Although as far as I can see wikicreole isn't available as context free grammar either..)
That's quite a big benefit - if we use/invent a "standard" XML format, we would be interoperable with any other wiki software that used it. Templates etc notwithstanding.
Steve