On Tue, 2004-08-17 at 10:29 +0100, Timwi wrote:
Bryan Ford wrote: LR parser generators are designed for languages that were
designed for LR parser generators; they tend to be difficult or impossible to use for more freeform languages such as wikitext without making some serious compromises or horrible hacks.
Can you provide a more specific example of this? So far, I have not encountered a situation where accepting all possible input strings would be hard to do. In fact, all I need to do to ensure this is to allow the "text" non-terminal to contain any token.
Also, I have not found it difficult to craft the grammar in such a way that bison's default disambiguation rules (conflict resolution rules) produce the correct result. There does not seem to be any real need to have the grammar be unambiguous.
I agree, i didn't find any problems so far while working on the parser at http://moinmoin.wikiwikiweb.de/NewWikiParser. Works fine so far, but doesn't build the DOM tree yet (only returns simple text currently). My plan is to use cDomlette for its low memory footprint and good dom manipulation performance, haven't worked on it in the last two weeks though.
An alternative to BisonGen might be the 'Gold parser generator' (http://www.devincook.com/goldparser/) that does more languages than just C and python, especially C# and Java. The module wrapping would need to be done manually in that case, using swig or something. Also the generator isn't really open source and only runs on win. It can use normal BNF grammar though, not the more verbose BisonBen xmlized one.
My current code: http://dl.aulinx.de/wiki.bgen http://dl.aulinx.de/wikishort.py http://dl.aulinx.de/wikishort.c