(Top posting to quickly summarize what I gathered from the discussion
and what would be required for Parsoid to expand pages with these
transclusions).
Parsoid currently relies on the mediawiki API to preprocess
transclusions and return wikitext (uses action=expandtemplates for this)
which it then parses using native Parsoid pipeline. Parsoid processes
extension tags via action=parse and weaves the result back into the
top-level content of the page.
As per your original email, I am assuming the T is a page with a special
content model that generates HTML and another page P has a transclusion
{{T}}.
So, when Parsoid encounters {{T}}, it should be able to replace {{T}}
with the HTML to generate the right parse output for P.
So, I am listing below 4 possible ways action=expandtemplates can
process {{T}}
1. Your newest implementation (that just returns back {{T}}):
* If Parsoid gets back {{T}}, one of two things can happen:
--- Parsoid, as usual, tries to parse it as wikitext, and it gets stuck
in an infinite loop (query MW api for expansion of {{T}}, get back
{{T}}, parse it as {{T}}, query MW api for expansion of {{T}}, .... ).
So, this will definitely not work.
--- Parsoid adds a special case check to see if the API sent back {{T}},
and in which case, requires a different API endpoint
(action=expandtohtml maybe?) to send back the html expansion based on
the assumption about output of expandtemplates. This would work and
would require the new endpoint to be implemented, but feels hacky.
So, going back to your original implementation, here are at least 3 ways
I see this working:
2. action=expandtemplates returns a <html>...</html> for the expansion
of {{T}}, but also provides an additional API response header that tells
Parsoid that T was a special content model page and that the raw HTML
that it received should not be sanitized.
3. action=expandtemplates returns <html>...</html> for the expansion of
{{T}} and no other indication about T being a special content model page
or not. However, if Parsoid (and other clients) are to trust these html
output always without sanitization, expandtemplates implementation
should have a conditional sanitization of <html> tags encountered in
wikitext to prevent XSS. As far as I understand, expandtemplates (on
master, not your patch) does not do this tag sanitization. But,
independent of that, what Parsoid and clients need is a guarantee that
it is safe to blindly splice the contents of any <html>...</html> it
receives for any {{T}} no matter whether what content model T implements.
4. Parsoid first queries the MW-api to find out the content model of T
for every transclusion {{T}} it encounters on the page P and based on
the content-model info, knows how to process the output of
action=expandtemplates.
Clearly 4. is expensive and 3. seems hacky, but if it can be made to
work, we can work with that.
But, both Gabriel and I think that solution 2. is the cleanest solution
for now that would work. The PHP parser (in your patch to handle {{T}})
already has information about the content model of T when it is
expanding {{T}} and it seems simplest and cleanest to return this
information back to clients in the non-default content content-model
expansions. That gives clients like Parsoid the cleanest way of handling
these.
If I am missing something or this is unclear, and this getting into too
much back and forth on email and it is simpler to discuss this on IRC, I
can hop onto any IRC channel on Monday or we can do this on
#mediawiki-parsoid, and one of us could later summarize the discussion
back onto this thread.
Thanks,
Subbu.
On 05/17/2014 02:54 AM, Daniel Kinzler wrote:
Am 16.05.2014 21:07, schrieb Gabriel Wicke:
On 05/15/2014 04:42 PM, Daniel Kinzler wrote:
The one thing that will not work on wikis with
$wgRawHtml disabled is parsing the output of expandtemplates.
Yes, which means
that it won't work with Parsoid, Flow, VE and other users.
And it has been
fixed now. In the latest version, expandtemplates will just
return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.
I do think that we can do better, and I pointed
out possible ways to do so
in my earlier mail:
> My preference
> would be to let the consumer directly ask for pre-expanded wikitext *or*
> HTML, without overloading action=expandtemplates. Even indicating the
> content type explicitly in the API response (rather than inline with an HTML
> tag) would be a better stop-gap as it would avoid some of the security and
> compatibility issues described above.
I don't quite understand what you
are asking for... action=parse returns HTML,
action=expandtemplates returns wikitext. The issue was with "mixed" output,
that
is, representing the expandion of templates that generate HTML in wikitext. The
solution I'm going for no is to simply not expand them.
-- daniel