On 11/23/07, Tim Starling <tstarling(a)wikimedia.org> wrote:
Apostrophes are converted to HTML in doAllQuotes().
Invalid HTML on input
is cleaned up in removeHTMLtags(). Both are now considered to be *after*
the preprocessor. So in your example, the preprocessor will produce:
Crazy: ''italics'' '''open-bold <br />
stuff'''.
Ok, that's good then.
The only thing that really needs escaping from the
preprocessor are the
characters "{|=}", and "<" when it occurs before the name of a
registered
tag hook. For "|" there is the old hack {{!}}, a template which contains
just "|". This takes advantage of the uncovered syntax rules in the
preprocessor to hide a character from the preprocessor, passing through a
literal "|" to the main pass. It's used for table syntax. This mechanism
could be extended and standardised, say with a "urldecode" parser
function, to put any arbitrary character into the preprocessor output.
So, all those characters, if escaped, will appear to the main parser
unescaped, i.e. as if they had been typed in directly. That's also
good.
Tags such as <gallery> work by an uglier and
more fragile method, i.e.
with strip markers. Strip markers are placeholders passed through to the
Ok, does the same go for <ref>? I haven't yet seen a <gallery> be
transcluded, but it could happen for <ref>. Presumably the parser will
have to be able to recognise and ignore the strip markers. These are
the "UNIQ" codes that get dotted in, yeah?
Steve