Is there a way we can narrow down this security check so it doesn't keep
breaking API requests, action=raw requests, and ResourceLoader requests,
etc?
Having the last param in a query string end in say ".png" or ".svg"
or
".jpg" or ".ogg" is..... very frequent when dealing with uploaded
files and
file pages. In addition to the reported breakages with ResourceLoader, I've
seen this problem break classic-style action=raw site CSS page loads
(action=raw&title=MediaWiki:Filepage.css) and API requests for
MwEmbed+TimedMediaHandler's video player, for the OEmbed extension I'm
fiddling with, etc.
The impression I get is this is:
1) only exploitable on IE 6 (which is now a small minority and getting
smaller)
2) only exploitable if the path portion of the URL does not include an
unencoded period (eg 'api' or 'api%2Ephp' instead of 'api.php')
3) only exploitable if raw HTML fragments can be injected into the output,
eg a '<body' or other that triggers IE's HTML detection
For 1) I'm honestly a bit willing to sacrifice a few IE 6 users at this
point; the vendor's dropped support, shipped three major versions, and is
actively campaigning to get the remaining users to upgrade. :) But I get
protecting, so if we can find a workaround that's ok.
For 2) ... if we can detect this it would be great as we could avoid
breaking *any* api.php, index.php, or load.php requests in most real-world
situations.
For 3) ... formatted & XML output from the API should always be safe as,
even if it triggers XML/HTML detection you can't slip in arbitrary <script>
bits. JSON output seems to be the problematic vector currently, as you can
manage to get arbitrary strings embedded in some places like error messages:
{"warnings":{"siteinfo":{"*":"Unrecognized value for
parameter
'siprop': <body onload=alert(1)>.html"}}}
On the other hand if our JSON output escaped '<' and '>' characters
you'd
get this totally safe document:
{"warnings":{"siteinfo":{"*":"Unrecognized value for
parameter
'siprop': \u003Cbody onload=alert(1)\u003E.html"}}}
I tested this by slipping a couple lines into ApiFormatJson:
$this->printText(
$prefix .
str_replace( '<', '\\u003C',
str_replace( '>', '\\u003E',
FormatJson::encode( $this->getResultData(),
$this->getIsHtml() ) ) ) .
$suffix
);
and can confirm that IE 6 doesn't execute the script bit.
Are there any additional exploit vectors for API output other than HTML tags
mixed unescaped into JSON?
-- brion