On 03/06/11 06:56, Brion Vibber wrote:
The impression I get is this is:
1) only exploitable on IE 6 (which is now a small minority and getting
smaller)
IE 6 and some earlier versions of IE, at least back to 4.
2) only exploitable if the path portion of the URL
does not include an
unencoded period (eg 'api' or 'api%2Ephp' instead of 'api.php')
3) only exploitable if raw HTML fragments can be injected into the output,
eg a '<body' or other that triggers IE's HTML detection
HTML is a particularly dangerous exploit vector since it leads to XSS
with no user interaction, but any content type can be faked. For
example, if you use a .bat extension, IE will prompt you to execute
the "batch file". So there's a potential for it being used for malware
distribution. That's why we're denying all file extensions, not just
.html.
For 1) I'm honestly a bit willing to sacrifice a
few IE 6 users at this
point; the vendor's dropped support, shipped three major versions, and is
actively campaigning to get the remaining users to upgrade. :) But I get
protecting, so if we can find a workaround that's ok.
We can't really do this without sending "Vary: User-Agent", which
would completely destroy our cache hit ratio. For people who use Squid
with our X-Vary-Options patch, it would be possible to use a very long
X-Vary-Options header to single out IE 6 requests, but not everyone
has that patch.
The patch we used for 1.16.5 included a User-Agent check, but I didn't
realise the caching implications. It'll be removed for 1.16.6, that's
one of the main reasons for doing a 1.16.6 release.
For 2) ... if we can detect this it would be great as
we could avoid
breaking *any* api.php, index.php, or load.php requests in most real-world
situations.
The main issue here is that we don't a wide variety of web servers set
up for testing. We know that Apache lets you detect %2E versus dot via
$_SERVER['REQUEST_URI'], but we don't know if any other web servers do
that.
Note that checking for %2E alone is not sufficient, a lot of
installations (including Wikimedia) have an alias /wiki ->
/w/index.php which can be used to exploit action=raw.
Are there any additional exploit vectors for API
output other than HTML tags
mixed unescaped into JSON?
Yes, all other content types, as I said above.
I think the current solution in trunk, plus the redirect idea that
I've been discussing with Roan, is our best bet for now, unless
someone wants to investigate $_SERVER['REQUEST_URI'].
In another post:
I know this has already been brought up, but that
doesn't work for POST, and
may not work for API clients that don't automatically follow redirects.
(Which it looks like includes MediaWiki's ForeignAPIRepo since our Http
class got redirection turned off by default a couple versions ago.)
If there is an actual problem with ForeignAPIRepo then we can look at
server-side special cases for it. But r89248 should allow all API
requests that have a dotless value in their last GET parameter, and a
quick review of ForeignAPIRepo in 1.16 and trunk indicates that it
always sends such requests.
The current solution could theoretically break some API clients. An
informative error message and a Location header will help the
maintainers of the clients to update them.
Since we're talking about discarded solutions for this, maybe it's
worth noting that I also investigated using a Content-Disposition
header. The vulnerability involves an incorrect cache filename, and
it's possible to override the cache filename using a
Content-Disposition "filename" parameter. The reason I gave up on it
is because we already use Content-Disposition for wfStreamFile():
header( "Content-Disposition:
inline;filename*=utf-8'$wgLanguageCode'" . urlencode( basename( $fname
) ) );
IE 6 doesn't understand the charset specification, so it ignores the
header and goes back to detecting the extension.
-- Tim Starling