Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

17 May 2014

(Top posting to quickly summarize what I gathered from the discussion 
and what would be required for Parsoid to expand pages with these 
transclusions).

Parsoid currently relies on the mediawiki API to preprocess 
transclusions and return wikitext (uses action=expandtemplates for this) 
which it then parses using native Parsoid pipeline.  Parsoid processes 
extension tags via action=parse and weaves the result back into the 
top-level content of the page.

As per your original email, I am assuming the T is a page with a special 
content model that generates HTML and another page P has a transclusion 
{{T}}.

So, when Parsoid encounters {{T}}, it should be able to replace {{T}} 
with the HTML to generate the right parse output for P.

So, I am listing below 4 possible ways action=expandtemplates can 
process {{T}}

1. Your newest implementation (that just returns back {{T}}):

* If Parsoid gets back {{T}}, one of two things can happen:
--- Parsoid, as usual, tries to parse it as wikitext, and it gets stuck 
in an infinite loop (query MW api for expansion of {{T}}, get back 
{{T}}, parse it as {{T}}, query MW api for expansion of {{T}}, .... ). 
So, this will definitely not work.
--- Parsoid adds a special case check to see if the API sent back {{T}}, 
and in which case, requires a different API endpoint 
(action=expandtohtml maybe?) to send back the html expansion based on 
the assumption about output of expandtemplates. This would work and 
would require the new endpoint to be implemented, but feels hacky.

So, going back to your original implementation, here are at least 3 ways 
I see this working:

2. action=expandtemplates returns a <html>...</html> for the expansion 
of {{T}}, but also provides an additional API response header that tells 
Parsoid that T was a special content model page and that the raw HTML 
that it received should not be sanitized.

3. action=expandtemplates returns <html>...</html> for the expansion of 
{{T}} and no other indication about T being a special content model page 
or not. However, if Parsoid (and other clients) are to trust these html 
output always without sanitization, expandtemplates implementation 
should have a conditional sanitization of <html> tags encountered in 
wikitext to prevent XSS. As far as I understand, expandtemplates (on 
master, not your patch) does not do this tag sanitization. But, 
independent of that, what Parsoid and clients need is a guarantee that 
it is safe to blindly splice the contents of any <html>...</html> it 
receives for any {{T}} no matter whether what content model T implements.

4. Parsoid first queries the MW-api to find out the content model of T 
for every transclusion {{T}} it encounters on the page P and based on 
the content-model info, knows how to process the output of 
action=expandtemplates.

Clearly 4. is expensive and 3. seems hacky, but if it can be made to 
work, we can work with that.

But, both Gabriel and I think that solution 2. is the cleanest solution 
for now that would work. The PHP parser (in your patch to handle {{T}}) 
already has information about the content model of T when it is 
expanding {{T}} and it seems simplest and cleanest to return this 
information back to clients in the non-default content content-model 
expansions. That gives clients like Parsoid the cleanest way of handling 
these.

If I am missing something or this is unclear, and this getting into too 
much back and forth on email and it is simpler to discuss this on IRC, I 
can hop onto any IRC channel on Monday or we can do this on 
#mediawiki-parsoid, and one of us could later summarize the discussion 
back onto this thread.

Thanks,
Subbu.

On 05/17/2014 02:54 AM, Daniel Kinzler wrote:
...
  Am 16.05.2014 21:07, schrieb Gabriel Wicke:
  On 05/15/2014 04:42 PM, Daniel Kinzler wrote:
  The one thing that will not work on wikis with
 $wgRawHtml disabled is parsing the output of expandtemplates.  Yes, which means
that it won't work with Parsoid, Flow, VE and other users.  And it has been
fixed now. In the latest version, expandtemplates will just
 return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.

  I do think that we can do better, and I pointed
out possible ways to do so
 in my earlier mail:

> My preference
> would be to let the consumer directly ask for pre-expanded wikitext *or*
> HTML, without overloading action=expandtemplates. Even indicating the
> content type explicitly in the API response (rather than inline with an HTML
> tag) would be a better stop-gap as it would avoid some of the security and
> compatibility issues described above.  I don't quite understand what you
are asking for... action=parse returns HTML,
 action=expandtemplates returns wikitext. The issue was with "mixed" output,
that
 is, representing the expandion of templates that generate HTML in wikitext. The
 solution I'm going for no is to simply not expand them.

 -- daniel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages