Re: [Wikitext-l] MediaWiki parser in Python

30 Jun 2011


      On Wed, Jun 29, 2011 at 4:14 PM, Peter17 peter017@gmail.com wrote:
...
I have been working as a student on the 2011 edition of the Google
Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
My mentor is Erik Rose.
For this purpose, we use a Python PEG parser called Pijnu [2] and
implement a grammar for it [3]. This way, we parse the wikitext into
an abstract syntax tree that we will then transform to HTML or other
formats.
One of the advantages of Pijnu is the simplicity and readability of
the grammar definition [3]. It is not finished yet, but what we have
done so far seems very promising.
Neat! Your life is definitely made easier by skipping full compatibility
with some of our freakier syntax oddities ;) which'll still be very handy
for various embedded-style "lite wiki" usages.
Great list of alternatives, libraries & algorithms in your notes too though
obviously mostly Python-oriented; looks like you've already looked at
PediaPress's mwlib library, which is also Python-based. It's definitely a
bit... hairier due to having to handle more of our funky syntax (it drives
the PDF download and print-on-demand system on Wikipedia).
I'm still looking around for good parser generator tools for PHP (we've been
fiddling with PEG.js in some of our JavaScript-side experiments so far but
will eventually need both JS and PHP implementations to cover editing tools
and actual back-end rendering), so if anybody stumbles on good existing ones
give a shout or we may have to roll some our own.
Bonus points if we can eventually share the formal grammar production rules
between multiple language implementations. :)
-- brion

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] MediaWiki parser in Python