I'm getting the impression there is a fundamental misunderstanding here.
Am 18.05.2014 04:28, schrieb Subramanya Sastry:
So, consider this wikitext for page P.
== Foo == {{wikitext-transclusion}} *a1 <map ..> ... </map> *a2 {{T}} (the html-content-model-transclusion) *a3
Parsoid gets wikitext from the API for {{wikitext-transclusion}}, parses it and injects the tokens into the P's content. Parsoid gets HTML from the API for <map..>...</map> and injects the HTML into the not-fully-processed wikitext of P (by adding an appropriate token wrapper). So, if {{T}} returns HTML (i.e. the MW API lets Parsoid know that it is HTML), Parsoid can inject the HTML into the not-fully-processed wikitext and ensure that the final output comes out right (in this case, the HTML from both the map extension and {{T}} would not get sanitized as it should be).
Does that help explain why we said we don't need the html wrapper?
No, it actually misses my point completely. My point is that this may work with the way parsoid uses expandtemplates, but it does not work for expandtemplates in general. Because expandtemplates takes full wikitext as input, and only partially replaces it.
So, let me phrase it this way:
If expandtemplates is called with text=
== Foo == {{T}}
[[Category:Bla]]
What should it return, and what content type should be declared in the http header?
Note that I'm not talking about how parsoid processes this text. That's not my point - my point is that expandtemplates can be and is used on full wikitext. In that context, the return type cannot be HTML.
All that said, if you want to provide the wrapper with <html model="whatever" ....>fully-expanded-HTML</html>, we can handle that as well. We'll use the model attribute of the wrapper, discard the wrapper and use the contents in our pipeline.
Why use the model attribute? Why would you care about the original model? All you need to know is that you'll get HTML. Exposing the original model in this context seems useless if not misleading. <html transclude="{{T}}></html> would give that backend parser a way to discard the HTML (as unsafe) and execute the transclusion instead (generating trusted HTML). In fact, we could just omit the content of the <html> tag.
So, model information either as an attribute on the wrapper, api response header, or a property in the JSON/XML response structure would all work for us.
As explained above, the return type cannot be HTML for the full text, because any "plain" wikitext would stay unprocessed. There needs to be a marker for "html transclusion *here*" in the text.
Am 18.05.2014 16:29, schrieb Gabriel Wicke:
The difference between wrapper and property is actually that using inline wrappers in the returned wikitext would force us to escape similar wrappers from normal template content to avoid opening a gaping XSS hole.
Please explain, I do not see the hole you mention.
If the input contained <html>evil stuff</html>, it would just get escaped by the preprocessor (unless $wgRawHtml is enabled), as it is now: https://de.wikipedia.org/w/api.php?action=expandtemplates&text=%3Chtml%3...
If <html transclude="{{T}}"> was passed, the parser/preprocessor would treat it like it would treat {{T}} - it would get trusted, backend generated HTML from respective Content object.
I see no change, and no opportunity to inject anything. Am I missing something?
A separate property in the JSON/XML structure avoids the need for escaping (and associated security risks if not done thoroughly), and should be relatively straightforward to implement and consume.
As explained above, I do not see how this would work except for the very special case of using expandtemplates to expand just a single template. This could be solved by introducing a new, single template mode for expandtemplates, e.g. using expand="Foo|x|y|z" instead of text="{{Foo|x|y|z}}".
Another way would be to use hints the structure returned by generatexml. There, we have an opportunity to declare a content type for a *part* of the output (or rather, input).
-- daniel