On 2/14/08, Daniel Kinzler daniel@brightbyte.de wrote:
Yes, though for the parser, there are three cases to consider for HTML/XML style tags:
- (whitelisted) HTML tags, which can occur "soupy", and are more or less passed
through (or "tidied" into valid xhtml). 2) Other tags (potentially handled by an extension) which must match in pairs exactly and cause the parser to take anything *inbetween* LITERALLY, and pass it to the extension for processing. 3) In case there is no such extension, it needs to go back, read the *tags* literally, and then parse the text between the tags.
There's even a fourth case, namely magic tags like <nowiki> that have to be known to the parser for special handling - these may also include <includeonly>, <onlyinclude> and <noinclude>, though those might be handled by the preprocessor, i'm not sure about that.
My grammar almost does all this - I just need to make extensions opaque, which is easy. Except 3) is really the default anyway, there is no "going back" as such.
I'm not dealing with <includeonly> etc yet - assuming they're preprocessor. Am I wrong?
Steve