On 1/25/08, Thomas Dalton thomas.dalton@gmail.com wrote:
However, the XHTML level seems too late: instead of a neat "image" node, you'd end up with all the DIV tags used to actually display the thing in MediaWiki - as opposed to being an abstract representation.
DIV tags *are* part of the abstract representation - it's the CSS that handles the display.
Well, not as abstract as say the new WikiCreole interchange format:
<xsd:complexType name="imageType"> xsd:sequence <xsd:element name="uri" type="xsd:string"/> <xsd:element name="alternative" type="simpletextType" minOccurs="0" maxOccurs="1"/> </xsd:sequence> </xsd:complexType>
There is then an XSLT layer to convert from that to actual XHTML: <xsl:template match="image"> <xsl:text disable-output-escaping="yes"><img src="</xsl:text> <xsl:value-of select="uri"/> <xsl:text disable-output-escaping="yes">"/></xsl:text> </xsl:template>
So it looks to me like there are the following layers in a conversion from wikitext to a rendered page:
1. Raw wikitext 2. Pre-processed wikitext before template transclusion 3. Pre-processed wikitext with template transclusion 4. Parsed wikitext into some abstract representation that understands 'bold' and 'image' but doesn't specify display 5. XHTML 6. Visual interpretation of the XHTML as performed by the browser
You've suggested that the XML generated by MediaWiki at 2 is no good as an interchange format, and that 5 is suitable. I was (am?) just wondering about the benefits of splitting the XHTML-generating parser into steps 4 and 5, and making 4 generate an XML interchange format, possibly compatible with wikicreole's.
From a programming perspective it seems nice to have a true *parser*
which focuses on processing input, then a *code generator* (probably written in XSLT) that produces output.
Obviously I'm only talking about doing this in a new parser, if/when that happens. The benefits would be too small to contemplate hacking that into hte current parser, I would think?
Steve