On 2/12/07, Lars Aronsson lars@aronsson.se wrote:
The current wiki syntax cannot be described in simple BNF, but it is not impossible to parse.
No, just very difficult, with lots of corner cases. Which makes it effectively impossible to guarantee that any tool other than MediaWiki itself (which is correct by definition) can parse the wikitext correctly in all cases. And perhaps more importantly, parse time for MediaWiki itself is, as I recall, somewhere on the order of 800ms. That's totally unacceptable for real-time use like WYSIWYG.
If a lossless intermediate representation such as XML could be developed, in theory, that would be ideal. Then the XML could be served for WYSIWYG or other clients and used for rendering, while the wikitext could be served to those who want to use it. The difficulty (if not impossibility) is in making it lossless: you have to be able to convert back and forth without changing anything.
*That* is probably the most interesting question right now from the perspective of stuff like WYSIWYG and third-party use. Formally parsing the current syntax is hopeless. But if we develop a mapping such that the entire enwiki database can be roundtripped, as a test case, *that* will allow all sorts of stuff to work. Once we have an intermediate XML representation, that could probably be turned directly into a rendered page (even including all skin elements) with just XSL, after template substitution. And that could probably be done in real time in most modern languages, including JavaScript. At least I hope.