[Wikitext-l] MediaWiki parser in Python

Erik Rose erik at mozilla.com
Mon Jul 11 23:45:51 UTC 2011


> I have recently subscribed to this list and I wanted to introduce myself.
> 
> I have been working as a student on the 2011 edition of the Google
> Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
> My mentor is Erik Rose.

...which probably means I should introduce myself as well. :-)

Hi! I'm Erik Rose, and I work on support.mozilla.com, where we keep thousands of support articles in a variant of MediaWiki syntax. Even outside Wikipedia itself (though no doubt driven by it), MW syntax has such a huge mindshare that our volunteers pretty much demanded it. At the moment, we use basically a straight port of the PHP to Python and then build painfully Byzantine layers around it to implement some custom syntax. Our summer project is to simplify this mess by building the most comprehensible, extensible MW parser available for Python:

* You'll be able to plug your own custom syntax bits into it without messing with the code.
* You can get the raw AST if you like. Or you can pass in transformation functions to customize the output of various nodes.
* We'll also provide hooks so you can do whatever you want with MW "product" features like includes, templates, and such (as opposed to pure "language" features).

As Peter already mentioned, our project's home is https://github.com/erikrose/mediawiki-parser. Or you might look at Peter's fork. Sometimes his is more up-to-date, sometimes mine.

We have most of the productions working now. Peter's working on templates at the moment, which are probably going to involve a pre-parsing phase, and then it's on to apostrophes, which I'm hoping we can rip off other people's work for. :-)

It's great to see other folks thinking about the language. I'm sure we'll talk soon!

Erik Rose


More information about the Wikitext-l mailing list