(Top posting to quickly summarize what I gathered from the discussion and what would be required for Parsoid to expand pages with these transclusions).
Parsoid currently relies on the mediawiki API to preprocess transclusions and return wikitext (uses action=expandtemplates for this) which it then parses using native Parsoid pipeline. Parsoid processes extension tags via action=parse and weaves the result back into the top-level content of the page.
As per your original email, I am assuming the T is a page with a special content model that generates HTML and another page P has a transclusion {{T}}.
So, when Parsoid encounters {{T}}, it should be able to replace {{T}} with the HTML to generate the right parse output for P.
So, I am listing below 4 possible ways action=expandtemplates can process {{T}}
1. Your newest implementation (that just returns back {{T}}):
* If Parsoid gets back {{T}}, one of two things can happen: --- Parsoid, as usual, tries to parse it as wikitext, and it gets stuck in an infinite loop (query MW api for expansion of {{T}}, get back {{T}}, parse it as {{T}}, query MW api for expansion of {{T}}, .... ). So, this will definitely not work. --- Parsoid adds a special case check to see if the API sent back {{T}}, and in which case, requires a different API endpoint (action=expandtohtml maybe?) to send back the html expansion based on the assumption about output of expandtemplates. This would work and would require the new endpoint to be implemented, but feels hacky.
So, going back to your original implementation, here are at least 3 ways I see this working:
2. action=expandtemplates returns a <html>...</html> for the expansion of {{T}}, but also provides an additional API response header that tells Parsoid that T was a special content model page and that the raw HTML that it received should not be sanitized.
3. action=expandtemplates returns <html>...</html> for the expansion of {{T}} and no other indication about T being a special content model page or not. However, if Parsoid (and other clients) are to trust these html output always without sanitization, expandtemplates implementation should have a conditional sanitization of <html> tags encountered in wikitext to prevent XSS. As far as I understand, expandtemplates (on master, not your patch) does not do this tag sanitization. But, independent of that, what Parsoid and clients need is a guarantee that it is safe to blindly splice the contents of any <html>...</html> it receives for any {{T}} no matter whether what content model T implements.
4. Parsoid first queries the MW-api to find out the content model of T for every transclusion {{T}} it encounters on the page P and based on the content-model info, knows how to process the output of action=expandtemplates.
Clearly 4. is expensive and 3. seems hacky, but if it can be made to work, we can work with that.
But, both Gabriel and I think that solution 2. is the cleanest solution for now that would work. The PHP parser (in your patch to handle {{T}}) already has information about the content model of T when it is expanding {{T}} and it seems simplest and cleanest to return this information back to clients in the non-default content content-model expansions. That gives clients like Parsoid the cleanest way of handling these.
If I am missing something or this is unclear, and this getting into too much back and forth on email and it is simpler to discuss this on IRC, I can hop onto any IRC channel on Monday or we can do this on #mediawiki-parsoid, and one of us could later summarize the discussion back onto this thread.
Thanks, Subbu.
On 05/17/2014 02:54 AM, Daniel Kinzler wrote:
Am 16.05.2014 21:07, schrieb Gabriel Wicke:
On 05/15/2014 04:42 PM, Daniel Kinzler wrote:
The one thing that will not work on wikis with $wgRawHtml disabled is parsing the output of expandtemplates.
Yes, which means that it won't work with Parsoid, Flow, VE and other users.
And it has been fixed now. In the latest version, expandtemplates will just return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.
I do think that we can do better, and I pointed out possible ways to do so in my earlier mail:
My preference would be to let the consumer directly ask for pre-expanded wikitext *or* HTML, without overloading action=expandtemplates. Even indicating the content type explicitly in the API response (rather than inline with an HTML tag) would be a better stop-gap as it would avoid some of the security and compatibility issues described above.
I don't quite understand what you are asking for... action=parse returns HTML, action=expandtemplates returns wikitext. The issue was with "mixed" output, that is, representing the expandion of templates that generate HTML in wikitext. The solution I'm going for no is to simply not expand them.
-- daniel