Am 14.05.2014 15:11, schrieb Gabriel Wicke:
On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
This
means that HTML returned from the preprocessor needs to be valid in
wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
possible, but my impression is that you are shooting for something that's
closer to the behavior of a tag extension. Those already bypass the
sanitizer, so would be less troublesome in the short term.
Yes. Just treat <html>...</html> like a tag extension, and it should work
fine.
Do you see any problems with that?
First of all you'll have to make sure that users cannot inject <html> tags
as that would enable arbitrary XSS. I might have missed it, but I believe
that this is not yet done in your current patch.
My patch doesn't change the handling of <html>...</html> by the parser.
As
before, the parser will pass HTML code in <html>...</html> through only if
wgRawHtml is enabled, and will mangle/sanitize it otherwise.
My patch does mean however that the text return by expandtemplates may not
render as expected when processed by the parser. Perhaps anomie's approach of
preserving the original template call would work, something like:
<html template="{{T}}">...</html>
Then, the parser could apply the normal expansion when encountering the tag,
ignoring the pre-rendered HTML.
In contrast to normal tag extensions <html>
would also contain fully
rendered HTML, and should not be piped through action=parse as is done in
Parsoid for tag extensions (in absence of a direct tag extension expansion
API end point). We and other users of the expandtemplates API will have to
add special-case handling for this pseudo tag extension.
Handling for the <html> tag should already be in place, since it's part of the
core spec. The issue is only to know when to allow/trust such <html> tags, and
when to treat them as plain text (or like a <pre> tag).
In HTML, the <html> tag is also not meant to be
used inside the body of a
page. I'd suggest using a different tag name to avoid issues with HTML
parsers and potential name conflicts with existing tag extensions.
As above: <html> is part of the core syntax, to support $wgRawHtml. It's just
disabled per default.
Overall it does not feel like a very clean way to do
this. My preference
would be to let the consumer directly ask for pre-expanded wikitext *or*
HTML, without overloading action=expandtemplates.
The question is how to represent non-wikitext transclusions in the output of
expandtemplates. We'll need an answer to this question in any case.
For the main purpose of my patch, expandtemplates is irrelevant. I added the
special mode that generates <html> specifically to have a consistent wikitext
representation for use by expandtemplates. I could simply disable it just as
well, so no expansion would apply for such templates when calling
expandtemplates (as is done for special page inclusiono).
Even indicating the
content type explicitly in the API response (rather than inline with an HTML
tag) would be a better stop-gap as it would avoid some of the security and
compatibility issues described above.
The content type did not change. It's wikitext.
-- daniel