On 03/06/11 06:56, Brion Vibber wrote:
The impression I get is this is:
- only exploitable on IE 6 (which is now a small minority and getting
smaller)
IE 6 and some earlier versions of IE, at least back to 4.
- only exploitable if the path portion of the URL does not include an
unencoded period (eg 'api' or 'api%2Ephp' instead of 'api.php') 3) only exploitable if raw HTML fragments can be injected into the output, eg a '<body' or other that triggers IE's HTML detection
HTML is a particularly dangerous exploit vector since it leads to XSS with no user interaction, but any content type can be faked. For example, if you use a .bat extension, IE will prompt you to execute the "batch file". So there's a potential for it being used for malware distribution. That's why we're denying all file extensions, not just .html.
For 1) I'm honestly a bit willing to sacrifice a few IE 6 users at this point; the vendor's dropped support, shipped three major versions, and is actively campaigning to get the remaining users to upgrade. :) But I get protecting, so if we can find a workaround that's ok.
We can't really do this without sending "Vary: User-Agent", which would completely destroy our cache hit ratio. For people who use Squid with our X-Vary-Options patch, it would be possible to use a very long X-Vary-Options header to single out IE 6 requests, but not everyone has that patch.
The patch we used for 1.16.5 included a User-Agent check, but I didn't realise the caching implications. It'll be removed for 1.16.6, that's one of the main reasons for doing a 1.16.6 release.
For 2) ... if we can detect this it would be great as we could avoid breaking *any* api.php, index.php, or load.php requests in most real-world situations.
The main issue here is that we don't a wide variety of web servers set up for testing. We know that Apache lets you detect %2E versus dot via $_SERVER['REQUEST_URI'], but we don't know if any other web servers do that.
Note that checking for %2E alone is not sufficient, a lot of installations (including Wikimedia) have an alias /wiki -> /w/index.php which can be used to exploit action=raw.
Are there any additional exploit vectors for API output other than HTML tags mixed unescaped into JSON?
Yes, all other content types, as I said above.
I think the current solution in trunk, plus the redirect idea that I've been discussing with Roan, is our best bet for now, unless someone wants to investigate $_SERVER['REQUEST_URI'].
In another post:
I know this has already been brought up, but that doesn't work for POST, and may not work for API clients that don't automatically follow redirects. (Which it looks like includes MediaWiki's ForeignAPIRepo since our Http class got redirection turned off by default a couple versions ago.)
If there is an actual problem with ForeignAPIRepo then we can look at server-side special cases for it. But r89248 should allow all API requests that have a dotless value in their last GET parameter, and a quick review of ForeignAPIRepo in 1.16 and trunk indicates that it always sends such requests.
The current solution could theoretically break some API clients. An informative error message and a Location header will help the maintainers of the clients to update them.
Since we're talking about discarded solutions for this, maybe it's worth noting that I also investigated using a Content-Disposition header. The vulnerability involves an incorrect cache filename, and it's possible to override the cache filename using a Content-Disposition "filename" parameter. The reason I gave up on it is because we already use Content-Disposition for wfStreamFile():
header( "Content-Disposition: inline;filename*=utf-8'$wgLanguageCode'" . urlencode( basename( $fname ) ) );
IE 6 doesn't understand the charset specification, so it ignores the header and goes back to detecting the extension.
-- Tim Starling