On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
This means that HTML returned from the preprocessor needs to be valid in wikitext to avoid being stripped out by the sanitizer. Maybe that's actually possible, but my impression is that you are shooting for something that's closer to the behavior of a tag extension. Those already bypass the sanitizer, so would be less troublesome in the short term.
Yes. Just treat <html>...</html> like a tag extension, and it should work fine. Do you see any problems with that?
First of all you'll have to make sure that users cannot inject <html> tags as that would enable arbitrary XSS. I might have missed it, but I believe that this is not yet done in your current patch.
In contrast to normal tag extensions <html> would also contain fully rendered HTML, and should not be piped through action=parse as is done in Parsoid for tag extensions (in absence of a direct tag extension expansion API end point). We and other users of the expandtemplates API will have to add special-case handling for this pseudo tag extension.
In HTML, the <html> tag is also not meant to be used inside the body of a page. I'd suggest using a different tag name to avoid issues with HTML parsers and potential name conflicts with existing tag extensions.
Overall it does not feel like a very clean way to do this. My preference would be to let the consumer directly ask for pre-expanded wikitext *or* HTML, without overloading action=expandtemplates. Even indicating the content type explicitly in the API response (rather than inline with an HTML tag) would be a better stop-gap as it would avoid some of the security and compatibility issues described above.
So it is important to think of renderers as services, so that they are usable from the content API and Parsoid. For existing PHP code this could even be action=parse, but for new renderers without a need or desire to tie themselves to MediaWiki internals I'd recommend to think of them as their own service. This can also make them more attractive to third party contributors from outside the MediaWiki world, as has for example recently happened with Mathoid.
True, but that has little to do with my patch. It just means that 3rd party Content objects should preferably implement getHtml() by calling out to a service object.
You are right that it is not an immediate issue with your patch. The point is about the *longer-term* role of the ContentHandler vs. the content API. The ContentHandler could either try to be the central piece of our new content API, or could become an integration point that normally calls out to the content API and other services to retrieve HTML.
To me the latter is preferable as it enables us to optimize the content API for high request rates by concentrating on doing one job well, and lets us leverage this API from the server-side MediaWiki front-end through ContentHandler.
Gabriel