Am 14.05.2014 15:11, schrieb Gabriel Wicke:
On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
This means that HTML returned from the preprocessor needs to be valid in wikitext to avoid being stripped out by the sanitizer. Maybe that's actually possible, but my impression is that you are shooting for something that's closer to the behavior of a tag extension. Those already bypass the sanitizer, so would be less troublesome in the short term.
Yes. Just treat <html>...</html> like a tag extension, and it should work fine. Do you see any problems with that?
First of all you'll have to make sure that users cannot inject <html> tags as that would enable arbitrary XSS. I might have missed it, but I believe that this is not yet done in your current patch.
My patch doesn't change the handling of <html>...</html> by the parser. As before, the parser will pass HTML code in <html>...</html> through only if wgRawHtml is enabled, and will mangle/sanitize it otherwise.
My patch does mean however that the text return by expandtemplates may not render as expected when processed by the parser. Perhaps anomie's approach of preserving the original template call would work, something like:
<html template="{{T}}">...</html>
Then, the parser could apply the normal expansion when encountering the tag, ignoring the pre-rendered HTML.
In contrast to normal tag extensions <html> would also contain fully rendered HTML, and should not be piped through action=parse as is done in Parsoid for tag extensions (in absence of a direct tag extension expansion API end point). We and other users of the expandtemplates API will have to add special-case handling for this pseudo tag extension.
Handling for the <html> tag should already be in place, since it's part of the core spec. The issue is only to know when to allow/trust such <html> tags, and when to treat them as plain text (or like a <pre> tag).
In HTML, the <html> tag is also not meant to be used inside the body of a page. I'd suggest using a different tag name to avoid issues with HTML parsers and potential name conflicts with existing tag extensions.
As above: <html> is part of the core syntax, to support $wgRawHtml. It's just disabled per default.
Overall it does not feel like a very clean way to do this. My preference would be to let the consumer directly ask for pre-expanded wikitext *or* HTML, without overloading action=expandtemplates.
The question is how to represent non-wikitext transclusions in the output of expandtemplates. We'll need an answer to this question in any case.
For the main purpose of my patch, expandtemplates is irrelevant. I added the special mode that generates <html> specifically to have a consistent wikitext representation for use by expandtemplates. I could simply disable it just as well, so no expansion would apply for such templates when calling expandtemplates (as is done for special page inclusiono).
Even indicating the content type explicitly in the API response (rather than inline with an HTML tag) would be a better stop-gap as it would avoid some of the security and compatibility issues described above.
The content type did not change. It's wikitext.
-- daniel