On 11/9/07, Jim Wilson wilson.jim.r@gmail.com wrote:
For another example, consider #REDIRECTs. When the #REDIRECT pattern is encountered at the beginning of a page, any subsequent content is ignored (stripped at submission time). And the "output" is variable. That is, it has an effect on the system whereby the rendered output depends on the viewing context - either it redirects to another page, or renders a link thereto.
Redirects are a special case that should be handled before passing off to the parser. This kind of thing does add extra complexity, yes, but it could be incorporated into a clear specification nonetheless.
Also consider extension tags. . . . Perhaps even more complex is the treatment of parser functions . . .
Ah, but those aren't going to be part of the "main" parser. Any parser would have to go through two main passes: one recursive pass to substitute templates and parser functions, and another non-recursive pass to deal with the resulting markup. The first pass would use only a very simple grammar, but would inherently require database access and knowledge of configuration. The second pass will not need to consider any parser tags or curly-brace stuff, but will need to know the bulk of the grammar, and that's the only place that the difficulties of formal specification arise. Defining the formal grammar for the first part is all but trivial, since it only needs to parse two different constructs (curly braces, which behave very straightforwardly; and XML-ish stuff, which for the most part also does).
The best one could hope for might be to define the basic wikitext markup language, ignoring the meanings of Namespaces, templating/transclusion, extension tags and parser functions. Even then, what use is such a grammar? It probably won't help simplify the MediaWiki Parser significantly since all the ignored features would still need to be accounted for, as they would be in any other application that hopes to integrate with MW syntax (for example an external WYSIWYG editor).
The grammar is only part of what you need to know to make sense of a page, if you want to do anything other than syntax highlighting. The grammar of C is only a small part of the C standard; if we standardize and clarify the meaning of the MediaWiki parser, the actual BNF grammars will only constitute part of the resulting document.