Should we have a formal grammar? Let's be pragmatic -- a formal grammar is a means to a couple of ends as far as I see it.
1 - to easily have equivalent parsers in PHP and JS, and to allow the community to help develop it in an interactive way a la ParserPlayground.
This is not an either-or thing. If the parser is MOSTLY formal, that's good enough. But we should still be shooting for like 97% of the cases to be handled by the grammar.
97% of the context-free portions might be possible, but my feeling is that once you start pushing what context-free grammars can directly do, then the grammar quickly becomes really messy and hard to maintain or comprehend. The context-free portion contains most wiki syntax, but does not cover larger-scale structures including HTML tags due to overlapping markup.
Converting arbitrarily overlapped structures (or tag soup in general) to a sensible *tree* requires random-access stacks, and falls outside CFGs. Different strategies are possible in this space, with the HTML5 spec being one.
AFAICT there are no popular formalisms for automata with random-access stacks, so any standardization will probably look very much like the HTML5 spec: a discussion of all cases in prose. If the HTML5 spec turns out to be good enough, then we don't have to standardize that part of the parser and have implementations in different languages and browsers already available, which would be good for portability.
2 - to give others a way to parse wikitext better.
This may not be necessary. If our parser can produce a nice abstract syntax tree at some point, the API can just emit some other regular format for people to use, perhaps XML or JSON based. Wikidom is more optimized for the editor, but it's probably also good for this purpose.
Even an annotated HTML DOM (using the data-* attributes for example) could be used. We might actually be able to off-load most context-sensitive parts of the parsing process to the browser's HTML parser by feeding it pre-tokenized HTML tag soup, for example via .innerHTML.
Gabriel