On Mon, Jul 11, 2011 at 4:46 PM, Erik Rose <erik@mozilla.com> wrote:
Say, while everybody's trying to figure out a formal grammar, have you had a look at Ward Cunningham's exploratory parsing kit? He gave me a demo at OSBridge, and it's a really handy tool. Basically, it's a web app with an asynchronous C backend. You paste a tentative PEG grammar into a textarea, and it runs through whatever corpus you want, showing you representative instances of how it does or does not match. He was running it against the full English Wikipedia on his laptop, and it took only half an hour or something—with results coming in as they were generated, of course.
Using that, they made a PEG-and-then-some implementation of MW syntax that parses darn near all of Wikipedia: https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg. (I call it "PEG-and-then-some" because it does have a lot of callbacks which might interlock with and affect the rule matching—not sure.)