On 11/23/07, Tim Starling tstarling@wikimedia.org wrote:
Apostrophes are converted to HTML in doAllQuotes(). Invalid HTML on input is cleaned up in removeHTMLtags(). Both are now considered to be *after* the preprocessor. So in your example, the preprocessor will produce:
Crazy: ''italics'' '''open-bold <br /> stuff'''.
Ok, that's good then.
The only thing that really needs escaping from the preprocessor are the characters "{|=}", and "<" when it occurs before the name of a registered tag hook. For "|" there is the old hack {{!}}, a template which contains just "|". This takes advantage of the uncovered syntax rules in the preprocessor to hide a character from the preprocessor, passing through a literal "|" to the main pass. It's used for table syntax. This mechanism could be extended and standardised, say with a "urldecode" parser function, to put any arbitrary character into the preprocessor output.
So, all those characters, if escaped, will appear to the main parser unescaped, i.e. as if they had been typed in directly. That's also good.
Tags such as <gallery> work by an uglier and more fragile method, i.e. with strip markers. Strip markers are placeholders passed through to the
Ok, does the same go for <ref>? I haven't yet seen a <gallery> be transcluded, but it could happen for <ref>. Presumably the parser will have to be able to recognise and ignore the strip markers. These are the "UNIQ" codes that get dotted in, yeah?
Steve