2010/11/29 Erik Moeller erik@wikimedia.org:
As far as I understand the pure security (as opposed to content) concerns, these fall primarily into these categories:
- client-side execution of unsafe formats using designated
applications (embedded macros, references to other malicious content etc.)
- exploitation of browser in-line display for purposes of XSS attacks or similar
Let me know if I'm missing a large category. I'm assuming server-side execution is not an issue for Wikimedia given correct server configuration.
Server side execution is not an issue, no.
The client-side issues can all be reduced to a file acting as type A to MediaWiki and as type B to the victim, where A is some harmless file type we'd like to allow users to upload and B is some potentially dangerous file type. This is usually enabled by one or more of the following factors: * IE second-guesses the server-provided MIME type in favor of its own brain-dead MIME type detection algorithm, which in particular is extremely eager to treat things as HTML (causing any embedded JS to be executed): the presence of certain HTML tags or tag-like strings in the first 255 bytes is sufficient reason for IE to call something HTML * File formats are often interpreted flexibly, so a file that doesn't conform to the standard completely may be read just fine by most applications. These flexibilities allow for creating a file that looks like an A but also comes close enough to being a B. For example, running an HTML page containing unified diff text in the middle through patch(1) will usually work, because patch(1) discards "garbage" before and after the diff. These flexibilities are usually undocumented and vary between applications, so it can be difficult to predict whether a file qualifies as "almost a B" * Some file formats are designed in such a way that a file can actually be a completely valid A *and* a completely valid B all at the same time. This is the case for most ZIP and ZIP-like formats
To illustrate the last sentence of the second bullet point, I'll quote Tim's blog post on upload security [1] (which is a fun read for anyone even mildly interested in the topic). It's part of the section on the GIFAR vulnerability, which involves a file that's a valid GIF or ZIP file, but which Java happily executes as a JAR (a ZIP-like format for executable Java bytecode) file because Java's JAR format validation is extremely lax, almost nonexistent. The only validation is does do is check for a certain magic number at the end of the file, so rejecting
"An alternative [to rejecting all ZIP files] would be to parse the entire zip directory and to reject any archives that contain a file with a .class extension. I can’t vouch for this method. **If you did this, the zip library you used would have to be exactly as tolerant of zip format errors as the one used by Java.** It would probably be best to actually shell out to Java to do the test."
(emphasis mine)
Roan Kattouw (Catrope)