Ivan Krstić wrote:
Jay R. Ashworth wrote:
there's a project underway to formalize the very wikitext grammar with which you're compatible.
There are efforts to produce something formal-like for some subset of the wikitext syntax. To my knowledge, no useful formal grammar can be produced for the complete syntax.
Just to mention this again...
[[Parsing expression grammar]]s are rather powerful tools for this sort of thing; they are extremely useful for situations where you have ad-hoc BNF grammars with ambiguity and/or added constraints, and in particular where the boundary between the lexical and syntactic layers is ill-defined, as in Wikitext, which often results in multiple possible parses or unlimited lookahead.
PEGs can be generated directly from BNF, with the application of tweaks to avoid left-recursion.
[[Packrat parser]]s implement PEGs in linear time. The basic idea is brute-force top-down matching with memoization to prevent a combinatorial explosion: this seemingly naive approach actually performs very well in practice, and the amount of memory consumed is surprisingly small, by the standards of modern computers. They can be compiled into native code using a parser-generator approach, and can usually be sped up/slimmed down quite considerably using a bit of hackery, where small but common special case parts of the grammar can be replaced by hand-written code using fast algorthms such as regexps.
There are a number of GPL'd implementations of packrat parsers already available in a variety of languages.
-- Neil