Ivan Krstić wrote:
Jay R. Ashworth wrote:
there's a project underway to formalize the
very wikitext grammar with
which you're compatible.
There are efforts to produce something formal-like for some subset of
the wikitext syntax. To my knowledge, no useful formal grammar can be
produced for the complete syntax.
Just to mention this again...
[[Parsing expression grammar]]s are rather powerful tools for this sort
of thing; they are extremely useful for situations where you have ad-hoc
BNF grammars with ambiguity and/or added constraints, and in particular
where the boundary between the lexical and syntactic layers is
ill-defined, as in Wikitext, which often results in multiple possible
parses or unlimited lookahead.
PEGs can be generated directly from BNF, with the application of tweaks
to avoid left-recursion.
[[Packrat parser]]s implement PEGs in linear time. The basic idea is
brute-force top-down matching with memoization to prevent a
combinatorial explosion: this seemingly naive approach actually performs
very well in practice, and the amount of memory consumed is surprisingly
small, by the standards of modern computers. They can be compiled into
native code using a parser-generator approach, and can usually be sped
up/slimmed down quite considerably using a bit of hackery, where small
but common special case parts of the grammar can be replaced by
hand-written code using fast algorthms such as regexps.
There are a number of GPL'd implementations of packrat parsers already
available in a variety of languages.
-- Neil