[Wikitext-l] Cunningham's exploratory parsing

Brion Vibber bvibber at wikimedia.org
Tue Jul 12 00:17:21 UTC 2011


On Mon, Jul 11, 2011 at 4:46 PM, Erik Rose <erik at mozilla.com> wrote:

> Say, while everybody's trying to figure out a formal grammar, have you had
> a look at Ward Cunningham's exploratory parsing kit? He gave me a demo at
> OSBridge, and it's a really handy tool. Basically, it's a web app with an
> asynchronous C backend. You paste a tentative PEG grammar into a textarea,
> and it runs through whatever corpus you want, showing you representative
> instances of how it does or does not match. He was running it against the
> full English Wikipedia on his laptop, and it took only half an hour or
> something—with results coming in as they were generated, of course.


> Using that, they made a PEG-and-then-some implementation of MW syntax that
> parses darn near all of Wikipedia:
> https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg. (I call it
> "PEG-and-then-some" because it does have a lot of callbacks which might
> interlock with and affect the rule matching—not sure.)
>

It is indeed dang impressive -- I expect to be stealing at least some of
those grammar rules. :)

We are however producing a different sort of intermediate structure rather
than going straight to HTML output, so things won't be an exact match
(especially where we do template stuff).

-- brion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/wikitext-l/attachments/20110711/c641a72f/attachment.htm 


More information about the Wikitext-l mailing list