There's been some comments on some old tasks such as T27707 https://phabricator.wikimedia.org/T27707 about problems with uploading files that include text metadata that looks like HTML elements.
Years ago, we added security checks for IE 5/6/7 to work around IE's mime type sniffing: if you went to view a .png file directly in IE (as opposed to in an <img>) the browser would check the first few bytes of the file to detect its type, overriding the HTTP Content-Type header. HTML would be detected with a higher priority than the actual image formats, making it possible to create an actual .png image which when viewed as an image looked like an image, but when viewed as a web page was interpreted as HTML, including any embedded JavaScript.
(This was defense in depth in addition to hosting files on a separate domain; among other things, we sometimes serve files out from the main domain when dealing with archived (deleted) versions, and third-party installs are not guaranteed to have a second domain.)
Browsers have moved on, but the code remains and it trips up legitimate files containing links in metadata, or sometimes just random compressed data that looks like an element!
I've done a quick research check on feasibility: * IE 6 and earlier can no longer access Wikimedia sites due to lack of SNI and TLS 1.0 or later * IE 7 on Windows XP can no longer access Wikimedia sites due to lack of SNI * IE 7 on Windows Vista **can** access Wikimedia sites. * IE 8 and higher support X-Content-Options: nosniff to disable sniffing, which we already use on all MediaWiki requests.
At some point Microsoft dropped the sniffing, but I'm not sure if it was a later IE version or an Edge version. No other browsers in reasonably current versions seem to have this problem.
So the only remaining browser version that might be affected is IE 7 on Windows Vista, which supports SNI and TLS 1.0. It might or might not still work once we drop TLS 1.0 some time in the future. (Per our TLS dashboard https://grafana.wikimedia.org/d/000000458/tls-ciphersuite-explorer?orgId=1 about 1.2% of our connections still use TLS 1.0, but this isn't broken down between logged-in-user views and anon views.)
Open questions: * Should we drop the anti-sniff checks on upload? * If we do, should we forbid logins with IE 7, or something else to protect the occasional IE 7 logged-in user from a hypothetical targeted drive-by attack? (Is it actually worth doing work and testing it for this?) * Should we add X-Content-Options: nosniff on files served from upload.wikimedia.org too?
-- brion
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi,
On 1/28/19 3:58 PM, Brion Vibber wrote:
Years ago, we added security checks for IE 5/6/7 to work around IE's mime type sniffing: if you went to view a .png file directly in IE (as opposed to in an <img>) the browser would check the first few bytes of the file to detect its type, overriding the HTTP Content-Type header. HTML would be detected with a higher priority than the actual image formats, making it possible to create an actual .png image which when viewed as an image looked like an image, but when viewed as a web page was interpreted as HTML, including any embedded JavaScript.
Tim wrote a nice blog post about how he reverse-engineered this: https://tstarling.com/blog/2008/12/secure-web-uploads/.
I don't have any comments on whether it's still needed, but if it's determined that MediaWiki can drop the checks, I'd like to see it turned into a PHP library...mostly because it's some neat code.
- -- Legoktm
On Mon, Jan 28, 2019 at 10:58 PM Kunal Mehta legoktm@member.fsf.org wrote:
Tim wrote a nice blog post about how he reverse-engineered this: https://tstarling.com/blog/2008/12/secure-web-uploads/.
I don't have any comments on whether it's still needed, but if it's determined that MediaWiki can drop the checks, I'd like to see it turned into a PHP library...mostly because it's some neat code.
Heck yes. :)
My provisional patch is still keeping the code, but using it more conservatively: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/487527
The exact IE-based heuristics are still checked but will trip only on files matching the exact conditions that would affect old IE versions. Most uploaded JPEG or PNG files with HTML-ish links in EXIF metadata should not trigger it, as the tag strings won't appear in the first 256 bytes of the file.
The less-exact heuristics (held over from before Tim's addition of the reverse-engineered exact checks) are now less overbearing, and should no longer trigger on <a href, <img, <pre, <table, or <title tags. This also obsoletes the $wgAllowTitlesInSVG setting, which will now be always true.
Feel free to help with review and testing -- especially welcome if folks have samples of JPEG, PDF, DjVu, or other files that were blocked before but should work so we can double-check. Thanks all!
-- brion
wikitech-l@lists.wikimedia.org