Is there a way we can narrow down this security check so it doesn't keep breaking API requests, action=raw requests, and ResourceLoader requests, etc?
Having the last param in a query string end in say ".png" or ".svg" or ".jpg" or ".ogg" is..... very frequent when dealing with uploaded files and file pages. In addition to the reported breakages with ResourceLoader, I've seen this problem break classic-style action=raw site CSS page loads (action=raw&title=MediaWiki:Filepage.css) and API requests for MwEmbed+TimedMediaHandler's video player, for the OEmbed extension I'm fiddling with, etc.
The impression I get is this is:
1) only exploitable on IE 6 (which is now a small minority and getting smaller) 2) only exploitable if the path portion of the URL does not include an unencoded period (eg 'api' or 'api%2Ephp' instead of 'api.php') 3) only exploitable if raw HTML fragments can be injected into the output, eg a '<body' or other that triggers IE's HTML detection
For 1) I'm honestly a bit willing to sacrifice a few IE 6 users at this point; the vendor's dropped support, shipped three major versions, and is actively campaigning to get the remaining users to upgrade. :) But I get protecting, so if we can find a workaround that's ok.
For 2) ... if we can detect this it would be great as we could avoid breaking *any* api.php, index.php, or load.php requests in most real-world situations.
For 3) ... formatted & XML output from the API should always be safe as, even if it triggers XML/HTML detection you can't slip in arbitrary <script> bits. JSON output seems to be the problematic vector currently, as you can manage to get arbitrary strings embedded in some places like error messages:
{"warnings":{"siteinfo":{"*":"Unrecognized value for parameter 'siprop': <body onload=alert(1)>.html"}}}
On the other hand if our JSON output escaped '<' and '>' characters you'd get this totally safe document:
{"warnings":{"siteinfo":{"*":"Unrecognized value for parameter 'siprop': \u003Cbody onload=alert(1)\u003E.html"}}}
I tested this by slipping a couple lines into ApiFormatJson:
$this->printText( $prefix . str_replace( '<', '\u003C', str_replace( '>', '\u003E', FormatJson::encode( $this->getResultData(), $this->getIsHtml() ) ) ) . $suffix );
and can confirm that IE 6 doesn't execute the script bit.
Are there any additional exploit vectors for API output other than HTML tags mixed unescaped into JSON?
-- brion