On 1/23/08, Tim Starling tstarling@wikimedia.org wrote:
The new preprocessor has an intermediate XML representation for pages before template inclusion, and it would be possible to store it. There's a RECOVER_ORIG mode that allows the original wikitext to be recovered from the XML. The problems with using it as a storage format are:
- It's useless as an interchange format since it still depends on
thousands of lines of MediaWiki code to generate HTML from it.
- The XML format, and the details of the transformation, are subject to
change.
- Transformation from wikitext to preprocessed XML is relatively fast, and
will hopefully get faster with further development, so it can be generated on demand for any application that needs it.
Hmm, ok. I'm having a bit of trouble picturing this XML format that includes "preprocessed" wikitext but doesn't have templates substituted? Do you mean that at this stage you've parsed the structure of template and parser functions calls, but haven't yet substituted in the result?
I think I agree that such an early stage of processing is not the place to generate an exchange format.
However, the XHTML level seems too late: instead of a neat "image" node, you'd end up with all the DIV tags used to actually display the thing in MediaWiki - as opposed to being an abstract representation.
Anyway, if I have understood the situation, writing an export of an interchange format would just be a lot of work, with no special benefit for us, to solve a problem there is not currently any great demand for. I think.
Steve