Jan Hidders wrote:
On Monday 16 August 2004 19:35, Magnus Manske wrote:
So, what you'd need is an EBNF representation or something?
Mind if I jump in and suggest a substitution for the "or something" alternative? :) Check out "parsing expression grammar" on wikipedia (and in more detail on the external links that article leads to). Although at the moment you probably won't find a parser generator that'll generate the PHP code you want from it, if the primary goal here is formal specification then that's not such an issue - and in any case, unlike (LA)LR CFGs, parsing expression grammars tend to be very easy to convert manually into working parsers.
Yes, that is all I ask. A precise formal grammar in whatever notation you like but preferrably in the input format of bison. If only because that would document what it is that your parser exactly does. Note there are some requirements:
- it should accept all possible input strings
- it should ideally do all the necessary parsing i.e. from the parse tree we
should be able to generate the output with a single tree-walk, and
- it should be unambiguous (and even LALR(1)) or have an explicit conflict
resolution rule
The first and third requirements above are likely to be _very_ difficult to achieve at the same time in a CFG paradigm, because of the limited lookahead and extremely rudimentary disambiguation facilities of LR-class parser generators. LR parser generators are designed for languages that were designed for LR parser generators; they tend to be difficult or impossible to use for more freeform languages such as wikitext without making some serious compromises or horrible hacks. Not to discourage you from trying, though; I just want to point out an alternative that you might consider when the shift-reduce and reduce-reduce conflicts start becoming unbearable. :)
One caveat, though: I'm not exactly unbiased, since I wrote much of the aforementioned stuff on parsing expression grammars. :)
OK, I'll shut up now.
Cheers, Bryan
Bryan Ford wrote:
Jan Hidders wrote:
On Monday 16 August 2004 19:35, Magnus Manske wrote:
So, what you'd need is an EBNF representation or something?
Mind if I jump in and suggest a substitution for the "or something" alternative? :) Check out "parsing expression grammar" on wikipedia (and in more detail on the external links that article leads to).
You still need a formal grammar to start with. :-)
Although at the moment you probably won't find a parser generator that'll generate the PHP code you want from it, if the primary goal here is formal specification then that's not such an issue - and in any case, unlike (LA)LR CFGs, parsing expression grammars tend to be very easy to convert manually into working parsers.
Did you just say "formal specification"? :-) Without a parser generator there is no hope that we will have somewhere in the documentation a correct, precise and complete specification of the wikisyntax.
-- Jan Hidders
Bryan Ford wrote:
- it should accept all possible input strings
- it should ideally do all the necessary parsing i.e. from the parse tree we
should be able to generate the output with a single tree-walk, and
- it should be unambiguous (and even LALR(1)) or have an explicit conflict
resolution rule
The first and third requirements above are likely to be _very_ difficult to achieve at the same time in a CFG paradigm, because of the limited lookahead and extremely rudimentary disambiguation facilities of LR-class parser generators. LR parser generators are designed for languages that were designed for LR parser generators; they tend to be difficult or impossible to use for more freeform languages such as wikitext without making some serious compromises or horrible hacks.
Can you provide a more specific example of this? So far, I have not encountered a situation where accepting all possible input strings would be hard to do. In fact, all I need to do to ensure this is to allow the "text" non-terminal to contain any token.
Also, I have not found it difficult to craft the grammar in such a way that bison's default disambiguation rules (conflict resolution rules) produce the correct result. There does not seem to be any real need to have the grammar be unambiguous.
I just want to point out an alternative that you might consider when the shift-reduce and reduce-reduce conflicts start becoming unbearable. :)
So far, I have not had a reduce/reduce conflict, and, as I said, the shift/reduce conflicts are not a problem.
Timwi
On Tue, 2004-08-17 at 10:29 +0100, Timwi wrote:
Bryan Ford wrote: LR parser generators are designed for languages that were
designed for LR parser generators; they tend to be difficult or impossible to use for more freeform languages such as wikitext without making some serious compromises or horrible hacks.
Can you provide a more specific example of this? So far, I have not encountered a situation where accepting all possible input strings would be hard to do. In fact, all I need to do to ensure this is to allow the "text" non-terminal to contain any token.
Also, I have not found it difficult to craft the grammar in such a way that bison's default disambiguation rules (conflict resolution rules) produce the correct result. There does not seem to be any real need to have the grammar be unambiguous.
I agree, i didn't find any problems so far while working on the parser at http://moinmoin.wikiwikiweb.de/NewWikiParser. Works fine so far, but doesn't build the DOM tree yet (only returns simple text currently). My plan is to use cDomlette for its low memory footprint and good dom manipulation performance, haven't worked on it in the last two weeks though.
An alternative to BisonGen might be the 'Gold parser generator' (http://www.devincook.com/goldparser/) that does more languages than just C and python, especially C# and Java. The module wrapping would need to be done manually in that case, using swig or something. Also the generator isn't really open source and only runs on win. It can use normal BNF grammar though, not the more verbose BisonBen xmlized one.
My current code: http://dl.aulinx.de/wiki.bgen http://dl.aulinx.de/wikishort.py http://dl.aulinx.de/wikishort.c
wikitech-l@lists.wikimedia.org