On 1/23/08, David Gerard dgerard@gmail.com wrote:
Steve (and others): What needs to be done for the ANTLR grammar that can be parallelised, so that the many people desperately after reliable independent parsing of wikitext can contribute to the effort?
I can currently see two relatively independent tasks that are required here: 1) Analysis of wikitext, understanding how the current parser works, negotiation over what features are required, how borderline features should operate etc. 2) Production of a useful, functional, efficient, readable etc ANTLR grammar.
I was doing well on 1. I've gotten bogged down in 2.
Suggestions for ways to help - recruit an ANTLR expert who could help fix my grammar, clean it up, make it readable - people to add some of the still-missing features (notably tables and HTML tags, also <refs> but I'm not sure where they're best handled) - general assistance expanding out the various features - assistance with some of the nitty gritty like character classes and such, which I haven't really delved into (precise definitions of letter, punctuation etc that work for all languages...)
Also: how to speed up ANTLR-generated PHP, so this has half a chance of being implemented?
Ahem. There is no such thing as ANTL-generated PHP. So in order for there be a quarter of a chance of such a thing being implemented, someone would need to write the PHP target for ANTLR.
Based on my experience so far, I really don't like the chances of simply generating PHP out of the box with ANTLR and dropping it in. The (java) code that's being generated so far is humongous and has a lot of problems. ANTLR has problems, bugs, unpleasant behaviour etc.
However. We do need a spec. And I don't know of a better way to specify the wikitext language than ANTLR. Whether or not spec can automagically generate a working parser is sort of a separate question...I think. But opinions are welcome.
Steve