On 03/07/12 18:47, Kevin Day wrote:
Even temporarily forgetting about the complexity of scanning PDFs, there's a lot of weirdness in a lot of files that even ClamAV doesn't find. For example: (replacing < and > with [ and ] so this doesn't trigger anyone's mail spam filters)
strings images/wikipedia/commons/7/7c/Silvana_Suárez_7.jpg | tail -9 [!-- INICIO - PUBLICIDAD POP-UP UNDER --] [IFRAME SRC="http://www.ciudad.com.ar/ar/popunder/p_submit.asp?site=personales.ciudad.com..." width=1 height=1][/IFRAME] [SCRIPT LANGUAGE="JavaScript"] //[!-- for (var i=1; i<15; i++){ setTimeout('self.focus();',i*30); //--] [/SCRIPT] [!-- FIN - PUBLICIDAD POP-UP UNDER --]
This looks like the image was stored in a free hosting web server configured to append that content to the served files... and not filtering out for the images. Then it got uploaded to commons.
There are dozens of jpeg files that are valid jpegs that have encrypted rar files appended to the end of the jpeg data. It might be a worthwhile idea to take any uploaded jpg/png/gif/etc and completely rewrite it before using it. Tools like jpegoptim / pngcrush / etc are pretty good at taking "wild" images and completely rewriting them to remove any oddities.
-- Kevin
Appended Rar files is one of the things my tool detected. If you send me a list of the images I can go trying to kill them.
Modifying the original images would be a bad idea. It'd be better to forbid uploading of such files (rars are hard to block, since you need to scan the full file...).