I have just commited an initial version of a php wrapper library for my parser.
http://svn.wikimedia.org/svnroot/mediawiki/trunk/parsers/libmwparser
An example of how it can be used:
include("mwp.php"); $istream = MWParserOpenString("input", "<strong id=hello>Hello World!", MWPARSER_UTF8); $parser = new_MWPARSER($istream); $out = MWParseArticle($parser); print implode($out). "\n"; MWParserCloseInputStream($istream); $istream = MWParserOpenString("input", "{|\n|[[Hello|hello world!]]", MWPARSER_UTF8); MWParserReset($parser, $istream); $out = MWParseArticle($parser); print implode($out). "\n";
which gives the following output:
<p><strong id="hello">Hello World!</strong></p> <table><tbody><tr><td><!-- BEGIN INTERNAL LINK [Hello] -->hello world!<!-- END INTERNAL LINK --></td></tr></tbody></table>
As you can see, I haven't sorted out the internal link resolution yet. But there is an efficient solution to this: make the database lookup after the lexer has run, before the parser runs. This is possible as all internal links are already known at that stage, and it would enable the parser to generate the links directly without any postprocessing.
Since it doesn't completely replace the current parser, it will take a bit of surgery to insert it into an instance of MediaWiki. I haven't tried this yet.
There is a lot of tedious work left to do before everything is completed. For instance, a large part of Sanitizer.php must be ported over to C in order to validate the html attributes.
Best regards,
/Andreas
wikitext-l@lists.wikimedia.org