On 12/11/07, David Gerard dgerard@gmail.com wrote:
I'm about to head off for a week and a half, so here's a quick progress stop. My ANTLR grammar so far is here: http://www.mediawiki.org/wiki/User:Stevage/ANTLR It does many features, but most aren't really complete. I offer this up just for curiosity's sake - no one should try and hack on it ;) [hrm, on closer inspection, that's not the latest version of that file. oh well.]
You should link the above from the ANTLR page and include this email at the top of it.
It's a wiki isn't it? Feel free. :)
This is still very much work in progress and hasn't been tidied up at all. I would be interested to hear whether anyone finds this ANTLR grammar readable and meaningful at all. If the grammar is not expressive and readable, there's not much point having it.
I'm especially troubled by the syntactic predicates which seem to be required to suppress warnings by the ANTLR compiler. These are the ones that look like:
rule: (option1) => option1 | (option2) => option2;
Most of the time this behaves exactly the same as:
rule: option1 | option2;
but if option1 and option2 can match the same input, then ANTLR will generate a warning if the syntactic predicates aren't there. However, with the syntactic predicates it ends up parsing the text twice (I think) - once to check whether the predicate will succeed, then once for real. It's a pretty annoying trade-off: readability and performance vs no warnings and certainty of execution path.
I'm also a bit concerned about the eventual performance of this thing. Already parsing a page of wikitext seems to take a very, very long time (eg, 10 seconds), but I don't know how much of that is caused by the environment (Java JVM), the debugger etc. And of course my grammar is pretty inefficient in many ways.
Steve