Magnus Manske wrote:
Jim Higson schrieb:
A while ago I started some experimental client software that took the output from wiki2xml, I got sidetracked but now I've got some more time I'm wanting to get back to it.
A few questions:
I've searched the list and see there is now a proper flex/bison parser. The wiki2xml convertor has not had any checkins for a while so I presume it's now defunct?
Yup. If you know Bison, we'd be glad if you could take a look at it. Especially the HTML parsing needs a lot of work.
I'm affraid not. I did a class last year in lex+yacc, so I mostly know my way round a spec, but I've no experience using it for a real language, especially one like wikitext which wasn't designed with formal grammars in mind.
A quick overview of what I'm doing: For my undergraduate disertation I'm writing a partial reimplementation of the mediawiki interface without any dynamic component on the server. This isn't intended to replace the current PHP interface, I am running it as an experiment into what is possible using very low spec web servers.
At the moment what I've got uses a javascript half-port of wiki2xml. If the project were to be taken any futher it would have to use a functionally identical parser to the Bison one, which as far as I can see would involve either modifying Bison to output javascript (very hard) or a C to javascript converter (also very hard!). As you can probably tell, I'll never fully reimplement the parsing process and don't intend this code to be used except for as a neat demonstration. Still, I'd like my intermediate XML format to be near the 'official' one because it is possible my presentation layer might be teamed up with a server-side parser (using something like &action=parsedxml instead of &action=raw). Even so, it isn't trying to be a replacement interface because it places too many requirements on the client and for the /Special:foo pages it will probably always delegate to PHP. At best it might one day be possible to run this project in parallel to a mediawiki wiki.
In the flexbisonparse module, there is also a "preprocessor" of mine which tries to convert HTML to wiki text as far as possible, which might then ease the parser code. Using the preprocessor, basically only <div> and <font> need to be taken care of by the parser, and the usual wiki tags (<pre>, <nowiki>, <math> etc.).
Does the flex/bison parser produce roughly the same XML as wiki2xml? (same tag names, nesting etc)
No. But the new one is better! :-)
Good, except this means a bit more work for me ;)
Is there a DTD, XML schema for the wikiXML? How about a rough spec?
No DTD or the like, but try the example at the end of this mail (can't attach files on the mailing list...)
Your help with the parser would be much appreciated.
I wish I could give more help with it. I can't really do much of anything until this disertation is done. After that possibly.
The example was very helpful, thanks.
Jim