[Wikitext-l] MediaWiki parser in Python

Wed Jun 29 23:48:37 UTC 2011

On Wed, Jun 29, 2011 at 4:14 PM, Peter17 <peter017 at gmail.com> wrote:

> I have been working as a student on the 2011 edition of the Google
> Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
> My mentor is Erik Rose.
>
> For this purpose, we use a Python PEG parser called Pijnu [2] and
> implement a grammar for it [3]. This way, we parse the wikitext into
> an abstract syntax tree that we will then transform to HTML or other
> formats.
>
> One of the advantages of Pijnu is the simplicity and readability of
> the grammar definition [3]. It is not finished yet, but what we have
> done so far seems very promising.
>

Neat! Your life is definitely made easier by skipping full compatibility
with some of our freakier syntax oddities ;) which'll still be very handy
for various embedded-style "lite wiki" usages.

Great list of alternatives, libraries & algorithms in your notes too though
obviously mostly Python-oriented; looks like you've already looked at
PediaPress's mwlib library, which is also Python-based. It's definitely a
bit... hairier due to having to handle more of our funky syntax (it drives
the PDF download and print-on-demand system on Wikipedia).

I'm still looking around for good parser generator tools for PHP (we've been
fiddling with PEG.js in some of our JavaScript-side experiments so far but
will eventually need both JS and PHP implementations to cover editing tools
and actual back-end rendering), so if anybody stumbles on good existing ones
give a shout or we may have to roll some our own.

Bonus points if we can eventually share the formal grammar production rules
between multiple language implementations. :)

-- brion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/wikitext-l/attachments/20110629/9c0ff3ae/attachment.htm