On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
This means
that HTML returned from the preprocessor needs to be valid in
wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
possible, but my impression is that you are shooting for something that's
closer to the behavior of a tag extension. Those already bypass the
sanitizer, so would be less troublesome in the short term.
Yes. Just treat <html>...</html> like a tag extension, and it should work
fine.
Do you see any problems with that?
First of all you'll have to make sure that users cannot inject <html> tags
as that would enable arbitrary XSS. I might have missed it, but I believe
that this is not yet done in your current patch.
In contrast to normal tag extensions <html> would also contain fully
rendered HTML, and should not be piped through action=parse as is done in
Parsoid for tag extensions (in absence of a direct tag extension expansion
API end point). We and other users of the expandtemplates API will have to
add special-case handling for this pseudo tag extension.
In HTML, the <html> tag is also not meant to be used inside the body of a
page. I'd suggest using a different tag name to avoid issues with HTML
parsers and potential name conflicts with existing tag extensions.
Overall it does not feel like a very clean way to do this. My preference
would be to let the consumer directly ask for pre-expanded wikitext *or*
HTML, without overloading action=expandtemplates. Even indicating the
content type explicitly in the API response (rather than inline with an HTML
tag) would be a better stop-gap as it would avoid some of the security and
compatibility issues described above.
So it is
important to think of renderers as services, so that they are
usable from the content API and Parsoid. For existing PHP code this could
even be action=parse, but for new renderers without a need or desire to tie
themselves to MediaWiki internals I'd recommend to think of them as their
own service. This can also make them more attractive to third party
contributors from outside the MediaWiki world, as has for example recently
happened with Mathoid.
True, but that has little to do with my patch. It just means that 3rd party
Content objects should preferably implement getHtml() by calling out to a
service object.
You are right that it is not an immediate issue with your patch. The point
is about the *longer-term* role of the ContentHandler vs. the content API.
The ContentHandler could either try to be the central piece of our new
content API, or could become an integration point that normally calls out to
the content API and other services to retrieve HTML.
To me the latter is preferable as it enables us to optimize the content API
for high request rates by concentrating on doing one job well, and lets us
leverage this API from the server-side MediaWiki front-end through
ContentHandler.
Gabriel