On Wed, Jun 29, 2011 at 4:14 PM, Peter17 <peter017@gmail.com> wrote:
I have been working as a student on the 2011 edition of the Google
Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
My mentor is Erik Rose.

For this purpose, we use a Python PEG parser called Pijnu [2] and
implement a grammar for it [3]. This way, we parse the wikitext into
an abstract syntax tree that we will then transform to HTML or other
formats.

One of the advantages of Pijnu is the simplicity and readability of
the grammar definition [3]. It is not finished yet, but what we have
done so far seems very promising.

Neat! Your life is definitely made easier by skipping full compatibility with some of our freakier syntax oddities ;) which'll still be very handy for various embedded-style "lite wiki" usages.

Great list of alternatives, libraries & algorithms in your notes too though obviously mostly Python-oriented; looks like you've already looked at PediaPress's mwlib library, which is also Python-based. It's definitely a bit... hairier due to having to handle more of our funky syntax (it drives the PDF download and print-on-demand system on Wikipedia).

I'm still looking around for good parser generator tools for PHP (we've been fiddling with PEG.js in some of our JavaScript-side experiments so far but will eventually need both JS and PHP implementations to cover editing tools and actual back-end rendering), so if anybody stumbles on good existing ones give a shout or we may have to roll some our own.

Bonus points if we can eventually share the formal grammar production rules between multiple language implementations. :)

-- brion