I think we should just generate PHP and JS from whatever powers the
ParserPlayground. Currently that's PEG.js, and the JS it generates
actually isn't very JS-like at all, it's more like a C program anyway,
so it's readily portable to PHP.
(Actually we might want to modify it so that it generates more concise
JS code; I found I could shrink it by about 70% with some hand-applied
transformations.)
That said there are varying PEG syntaxes[1] and we may find that PEG.js
isn't the best of them.
[1] don't make me say "syntactes"
On 6/29/11 4:48 PM, Brion Vibber wrote:
On Wed, Jun 29, 2011 at 4:14 PM, Peter17
<peter017(a)gmail.com
<mailto:peter017@gmail.com>> wrote:
I have been working as a student on the 2011 edition of the Google
Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
My mentor is Erik Rose.
For this purpose, we use a Python PEG parser called Pijnu [2] and
implement a grammar for it [3]. This way, we parse the wikitext into
an abstract syntax tree that we will then transform to HTML or other
formats.
One of the advantages of Pijnu is the simplicity and readability of
the grammar definition [3]. It is not finished yet, but what we have
done so far seems very promising.
Neat! Your life is definitely made easier by skipping full compatibility
with some of our freakier syntax oddities ;) which'll still be very
handy for various embedded-style "lite wiki" usages.
Great list of alternatives, libraries & algorithms in your notes too
though obviously mostly Python-oriented; looks like you've already
looked at PediaPress's mwlib library, which is also Python-based. It's
definitely a bit... hairier due to having to handle more of our funky
syntax (it drives the PDF download and print-on-demand system on Wikipedia).
I'm still looking around for good parser generator tools for PHP (we've
been fiddling with PEG.js in some of our JavaScript-side experiments so
far but will eventually need both JS and PHP implementations to cover
editing tools and actual back-end rendering), so if anybody stumbles on
good existing ones give a shout or we may have to roll some our own.
Bonus points if we can eventually share the formal grammar production
rules between multiple language implementations. :)
-- brion
_______________________________________________
Wikitext-l mailing list
Wikitext-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l
--
Neil Kandalgaonkar <neilk(a)wikimedia.org>