On Mon, Oct 13, 2003 at 09:05:14PM +0100, Neil Harris wrote:
I would imagine that a formal grammar for Wikipedia markup (for example using EBNF) might be a good thing.
This could then be used for three purposes:
- to define the grammar clearly for technical reference purposes
- to allow the generation of parsers using parser-generator compilers
which are available for a large number of languages, including C/C++, Python, Java etc.
- to help define a Document Object Model for the output of the parser
- to allow parsers to be validated as correct, by allowing the
compilation of a set of unit tests
However, what makes this difficult is that there should no invalid documents in Wiki-markup: everything should produce some output, even if it's partly broken: for example, opening a link body, but not closing it, should end up with literal "[[" in the text, not a parser error. Another way of putting it is that _all_ strings should be valid productions of the grammar: however, done naively, this can end up with an ambiguous grammar where the same input can be parsed two ways.
Has anyone made an attempt at defining a formal grammar for Wikipedia?
Plus, make sure it doesn't make these mistakes: http://en.wikipedia.org/wiki/User:Marumari/Wikitext_Rendering_Quirks