On Mon, Jul 11, 2011 at 4:46 PM, Erik Rose <erik(a)mozilla.com> wrote:
Say, while everybody's trying to figure out a
formal grammar, have you had
a look at Ward Cunningham's exploratory parsing kit? He gave me a demo at
OSBridge, and it's a really handy tool. Basically, it's a web app with an
asynchronous C backend. You paste a tentative PEG grammar into a textarea,
and it runs through whatever corpus you want, showing you representative
instances of how it does or does not match. He was running it against the
full English Wikipedia on his laptop, and it took only half an hour or
something—with results coming in as they were generated, of course.
Using that, they made a PEG-and-then-some
implementation of MW syntax that
parses darn near all of Wikipedia:
https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg. (I call it
"PEG-and-then-some" because it does have a lot of callbacks which might
interlock with and affect the rule matching—not sure.)
It is indeed dang impressive -- I expect to be stealing at least some of
those grammar rules. :)
We are however producing a different sort of intermediate structure rather
than going straight to HTML output, so things won't be an exact match
(especially where we do template stuff).
-- brion