Hi
My recommendation is to address the actual reason why
someone might
want a context-free grammar in the first place. Considering how much
time and creative energy has been spent on trying to create the
one-true-parser, I wonder whether it would be easier to simply port
the existing Parser to other languages directly (regular expressions
and all). I bet it would be.
My experience is that is was easier to learn how the parser behaves from
studying example outputs than to deduce that from its source code.
In the ideal case, a clean parser implementation (in any language) would
be almost as good as a formal definition of the syntax. That's basically
the reason I see for all of us trying to come up with a context-free
grammat - it gives you a parser that is easy to understand and easy to
port to other platforms.
Now the current MediaWiki parser is all but clean. It's not a simple
class with well defined interfaces that you can stick into another PHP
program. It also doesn't generate a clean parse tree - it mangles
strings until it arrives at something HTML-like and then cleans it up.
Since the syntax includes all sorts of ways a page can interact with
data outside of the current the interface of such a pluggable class
would probably be pretty complex. Maybe one way of making some progress
is to decide on this interface and push the existing parser towards it?
By the way, are any of you attending Wikimania? I would love to
participate in any discussion on this topic.
Best regards
Tomaž