On Nov 12, 2004, at 6:46 AM, Daniel Kinzler wrote:
The fact that SVG, MIDI and other formats are blocked is getting really annoying. People complain about it over and over, and it's a bad situation also regarding the fact that the GFDL calls for the "transparent source" of a document.
As I understand it those formats are blocked because MSIE interprets everything as HTML that *looks* like HTML. It was then stated that in order to circumvent this, a varifyer would have to be written for all formats. I do not understand why this is so, and I would like to suggest a simple solution:
We have a heuristic check which attempts to match MSIE's heuristic test for HTML and rejects anything that matches. Hopefully it's good enough for that, though there may be other dangerous formats that it attempts to recognize, or other checks in the HTML heuristic which I might have missed.
MSIE's MIME type "detection" (the process in which it throws away the server's specified content-type information and pulls a new one out of its butt in an unreliable, insecure manner) is partially documented here: http://msdn.microsoft.com/workshop/networking/moniker/overview/ appendix_a.asp
MIDI is probably safe. It doesn't seem to be in IE's internally recognized list of types, so it shouldn't try to autodetect.
SVG is a more dangerous format; IIRC it explicitly allows for the use of JavaScript. Would you mind testing the main SVG-supporting browsers (particularly the Adobe SVG Viewer plug-in running in MSIE and Mozilla) to ensure that JavaScript in a SVG file can't access cookies or hijack the containing browser window?
- when a file is uploaded, run "file -bi" against that file and
remember the output, which is (a pretty good guess of) the mime-type of the file.
MediaWiki can't generally rely on 'file' since it's an external program. It may not give consistent results on all platforms, and is completely absent on some (such as Windows). It's also known to fail to catch the MSIE holes, which can detect HTML on actual valid image files.
- have a map of mime-types-to-file-extensions. Look up the mime-type
returned by file in that table. If it mismatches the file extension, warn about it and refuse to upload. Skip the test if the mime-type is not in the table.
For known image types, we already check that the detected image type matches the extension.
If we are concerned about viruses in general, why not run a virus scanner against every uploaded files? Uploads are not the frequent, CPU should be able to cope with that.
Mainly we're concerned about JavaScript session hijacking, but other problems are a concern as well. Feel free to whip up a wrapper around clamav or something, that might be useful...
-- brion vibber (brion @ pobox.com)