Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

19 May 2014


      I'm getting the impression there is a fundamental misunderstanding here.
Am 18.05.2014 04:28, schrieb Subramanya Sastry:
...
So, consider this wikitext for page P.
== Foo ==
{{wikitext-transclusion}}
  *a1
<map ..> ... </map>
  *a2
{{T}} (the html-content-model-transclusion)
  *a3
Parsoid gets wikitext from the API for {{wikitext-transclusion}}, parses it and
injects the tokens into the P's content. Parsoid gets HTML from the API for
<map..>...</map> and injects the HTML into the not-fully-processed wikitext of P
(by adding an appropriate token wrapper). So, if {{T}} returns HTML (i.e. the MW
API lets Parsoid know that it is HTML), Parsoid can inject the HTML into the
not-fully-processed wikitext and ensure that the final output comes out right
(in this case, the HTML from both the map extension and {{T}} would not get
sanitized as it should be).
Does that help explain why we said we don't need the html wrapper?
No, it actually misses my point completely. My point is that this may work with
the way parsoid uses expandtemplates, but it does not work for expandtemplates
in general. Because expandtemplates takes full wikitext as input, and only
partially replaces it.
So, let me phrase it this way:
If expandtemplates is called with text=
== Foo ==
   {{T}}
[[Category:Bla]]
What should it return, and what content type should be declared in the http header?
Note that I'm not talking about how parsoid processes this text. That's not my
point - my point is that expandtemplates can be and is used on full wikitext. In
that context, the return type cannot be HTML.
...
All that said, if you want to provide the wrapper with <html model="whatever"
....>fully-expanded-HTML</html>, we can handle that as well. We'll use the model
attribute of the wrapper, discard the wrapper and use the contents in our pipeline.
Why use the model attribute? Why would you care about the original model? All
you need to know is that you'll get HTML. Exposing the original model in this
context seems useless if not misleading. <html transclude="{{T}}></html> would
give that backend parser a way to discard the HTML (as unsafe) and execute the
transclusion instead (generating trusted HTML). In fact, we could just omit the
content of the <html> tag.
...
So, model information either as an attribute on the wrapper, api response
header, or a property in the JSON/XML response structure would all work for us.
As explained above, the return type cannot be HTML for the full text, because
any "plain" wikitext would stay unprocessed. There needs to be a marker for
"html transclusion *here*" in the text.
Am 18.05.2014 16:29, schrieb Gabriel Wicke:
...
The difference between wrapper and property is actually that using inline
wrappers in the returned wikitext would force us to escape similar wrappers
from normal template content to avoid opening a gaping XSS hole.
Please explain, I do not see the hole you mention.
If the input contained <html>evil stuff</html>, it would just get escaped by the
preprocessor (unless $wgRawHtml is enabled), as it is now:
https://de.wikipedia.org/w/api.php?action=expandtemplates&text=%3Chtml%3...
If <html transclude="{{T}}"> was passed, the parser/preprocessor would treat it
like it would treat {{T}} - it would get trusted, backend generated HTML from
respective Content object.
I see no change, and no opportunity to inject anything. Am I missing something?
...
A separate property in the JSON/XML structure avoids the need for escaping
(and associated security risks if not done thoroughly), and should be
relatively straightforward to implement and consume.
As explained above, I do not see how this would work except for the very special
case of using expandtemplates to expand just a single template. This could be
solved by introducing a new, single template mode for expandtemplates, e.g.
using expand="Foo|x|y|z" instead of text="{{Foo|x|y|z}}".
Another way would be to use hints the structure returned by generatexml. There,
we have an opportunity to declare a content type for a *part* of the output (or
rather, input).
-- daniel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages