On Fri, 2004-08-13 at 20:46 +0200, Magnus Manske wrote:
Warning: Yeat Another Crazy Idea of Mine ahead. If you're sick of these (by bitter experience;-) delete this mail *now*.
Still here? Great!
OK, we all know that the current parser, while working, is not the final word. It is kinda slow due to multi-pass, the source is confusing, and there are some persistant bugs in it, like the template malfunctions.
I therefore suggest a new structure:
- Preprocessor
- Wiki markup to XML
- XML to (X)HTML
This is what i'm writing currently, except that the parser will return a dom tree instead of the xml dump of it. Saves another parse step before postpocessing (template replacement, link status updates etc and the final xslt transform). Besides being able to save the dom tree as xml at any stage it's also possible to pickle the python object, which is a bit faster to wake up than xml.
Caveat: Based on python's xml features, don't know a lot about php dom implementations.