On 7/4/07, David Gerard dgerard@gmail.com wrote:
Heh. At work, I deal with most of the wiki engines anyone has ever heard of. Let me assure you: *all* the various wiki markups are horrible, and I don't think any of them were "designed" per se. MediaWiki's is, I understand, provably impossible to put in EBNF, which is why the parser documentation is the parser code ...
I imagine that it's possible to put all of it in EBNF if you had unlimited lookahead, although I don't claim to be any kind of expert on language theory. Of course, you'd have different grammars for different content languages, and different grammars depending on what options the wiki chose (e.g., aliases for the image and category namespaces), if you wanted to parse it all properly and not just treat images, categories, links with namespace, links with no namespace, and interwiki links all the same, for instance. WikiCreole at least sidesteps that, but at the cost of losing features (no namespaces, no categories, no templates).
I would hope WikiCreole would pay attention to being well-formed as a language, at least to the extent of being EBNFable, and extended with a view to remaining so ...
Well, searching wikicreole.org for EBNF leads to zero hits, and BNF gives one unrelated hit, so I can't admit to much optimism in that regard. A cursory look at the standard reveals tidbits like "a line starting with ** (including optional whitespace before and afterwards), immediately following any list element on a line above, will be treated as a nested unordered list element. Otherwise it will be treated as the beginning of bold text." That doesn't sound promising to me, but hey, I don't know much about formal language theory.
It also requires that [[http://www.example.com]] be parsed differently from [[http:/haha I tricked you]], which (like MW markup) requires an arbitrary amount of lookahead depending on available protocols. Furthermore it doesn't specify which protocols are to be supported, so potentially the grammar could even change within a single wiki package if differently configured (this is the case for MediaWiki's $wgUrlProtocols, for instance).
So I suspect that hasn't been a major goal of the project. I'm not even sure it's realistically possible, sticking with purely wikitext-style markup, unless you want to invent a fantastic menagerie of creatively interspersed punctuation marks for every variation of markup, which would of course need to be memorized by rote. Then you could do things like say that [! !] or God knows what is an external link, for instance, doing away with the need for lookahead at the URL protocol; use something different for bold and lists; etc. MediaWiki uses three different markups for internal links, external links, interwiki links, image inclusions, category inclusions, template inclusions, and direct media links, for instance. If you wanted to use seven different markups for that instead, it would be more easily parseable but probably harder to remember and more confusing to look at. It would be interesting to see someone try, though.