Jay R. Ashworth wrote:
On Mon, Aug 28, 2006 at 12:01:00AM +0100, Buttay cyril wrote:
3- the list of alternative parsers ( http://meta.wikimedia.org/wiki/Alternative_parsers ) does
not mention
wikitext2docbook, and says that flexbisonparse is "Intended as an eventual replacement to the parsing code inside MediaWiki itself", which is rather promising!
I don't know that that is what Magnus is calling it, but that's what it does. I forget what language he's doing it in. Check the list archives; he's mentioned it here in the last couple of months (and may well chime in here).
As someone who's been playing with alternative parsers (though not Magnus), I'm pretty sure the flexbisonparse project is currently dead. Magnus moved to his wiki2xml project (also available in the MediaWiki repository), which is actually coded in PHP. As far as I know, though, it's the single most feature-complete alternative parser we have. Not claiming it's perfect, but it's good... I haven't worked with flexbisonparse, though, so maybe it's better than I know.
I've actually been working on a Python-based wikitext parser, using some techniques that should make the system a bit faster and cleaner... With a lot of luck, I should start making progress on that again in the next month or so.
For anyone who cares, I'll probably be trying to implement a PEG-based parser using mxTextTools, since I think that should be able to parse all of MediaWiki's wikitext, and should be about twice as fast as the current Parser.php (which is about as fast as wiki2xml)... Or I might just end up using ANTLR, if I can bully my current semi-grammar into working in that framework... If anyone knows of a decent PEG parser with a Python API (a packrat parser might be ideal), that'd be great too. *shrugs*
- Eric Astor