I think we should just generate PHP and JS from whatever powers the ParserPlayground. Currently that's PEG.js, and the JS it generates actually isn't very JS-like at all, it's more like a C program anyway, so it's readily portable to PHP.
(Actually we might want to modify it so that it generates more concise JS code; I found I could shrink it by about 70% with some hand-applied transformations.)
That said there are varying PEG syntaxes[1] and we may find that PEG.js isn't the best of them.
[1] don't make me say "syntactes"
On 6/29/11 4:48 PM, Brion Vibber wrote:
On Wed, Jun 29, 2011 at 4:14 PM, Peter17 <peter017@gmail.com mailto:peter017@gmail.com> wrote:
I have been working as a student on the 2011 edition of the Google Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation. My mentor is Erik Rose. For this purpose, we use a Python PEG parser called Pijnu [2] and implement a grammar for it [3]. This way, we parse the wikitext into an abstract syntax tree that we will then transform to HTML or other formats. One of the advantages of Pijnu is the simplicity and readability of the grammar definition [3]. It is not finished yet, but what we have done so far seems very promising.
Neat! Your life is definitely made easier by skipping full compatibility with some of our freakier syntax oddities ;) which'll still be very handy for various embedded-style "lite wiki" usages.
Great list of alternatives, libraries & algorithms in your notes too though obviously mostly Python-oriented; looks like you've already looked at PediaPress's mwlib library, which is also Python-based. It's definitely a bit... hairier due to having to handle more of our funky syntax (it drives the PDF download and print-on-demand system on Wikipedia).
I'm still looking around for good parser generator tools for PHP (we've been fiddling with PEG.js in some of our JavaScript-side experiments so far but will eventually need both JS and PHP implementations to cover editing tools and actual back-end rendering), so if anybody stumbles on good existing ones give a shout or we may have to roll some our own.
Bonus points if we can eventually share the formal grammar production rules between multiple language implementations. :)
-- brion
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l