On Wed, Jun 29, 2011 at 4:14 PM, Peter17 <span dir="ltr"><<a href="mailto:peter017@gmail.com">peter017@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
I have been working as a student on the 2011 edition of the Google<br>
Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.<br>
My mentor is Erik Rose.<br>
<br>
For this purpose, we use a Python PEG parser called Pijnu [2] and<br>
implement a grammar for it [3]. This way, we parse the wikitext into<br>
an abstract syntax tree that we will then transform to HTML or other<br>
formats.<br>
<br>
One of the advantages of Pijnu is the simplicity and readability of<br>
the grammar definition [3]. It is not finished yet, but what we have<br>
done so far seems very promising.<br></blockquote><div><br>Neat! Your life is definitely made easier by skipping full compatibility with some of our freakier syntax oddities ;) which'll still be very handy for various embedded-style "lite wiki" usages.<br>
<br>Great list of alternatives, libraries & algorithms in your notes too though obviously mostly Python-oriented; looks like you've already looked at PediaPress's mwlib library, which is also Python-based. It's definitely a bit... hairier due to having to handle more of our funky syntax (it drives the PDF download and print-on-demand system on Wikipedia).<br>
<br>I'm still looking around for good parser generator tools for PHP (we've been fiddling with PEG.js in some of our JavaScript-side experiments so far but will eventually need both JS and PHP implementations to cover editing tools and actual back-end rendering), so if anybody stumbles on good existing ones give a shout or we may have to roll some our own.<br>
<br>Bonus points if we can eventually share the formal grammar production rules between multiple language implementations. :)<br><br>-- brion<br></div></div>