We've once again been notified that our mirror of the Wikimedia images is "hosting malware". A quick check appears to mostly be more newly uploaded PDFs with one or more exploits in them, but there are also a few other media types that seem to be similarly damaged.
I'm personally okay with ignoring it, it's not hurting us any, but ideally I'd like to see things like this get removed. Many of the infected PDFs appear to be arabic language documents that would be of interest to people critical of their government, so the implications of what's going on here are probably bigger than just random viruses getting added to files.
I'm happy to scan everything again and post a list of things. I'm also willing to automate this if it would help (periodic scans and uploading a list of all questionable images to a wiki page somewhere?) Anyone have any suggestions on what to do here?
-- Kevin
On Mar 11, 2013, at 1:02 PM, Kevin Day kevin@your.org wrote:
We've once again been notified that our mirror of the Wikimedia images is "hosting malware". A quick check appears to mostly be more newly uploaded PDFs with one or more exploits in them, but there are also a few other media types that seem to be similarly damaged.
I'm personally okay with ignoring it, it's not hurting us any, but ideally I'd like to see things like this get removed. Many of the infected PDFs appear to be arabic language documents that would be of interest to people critical of their government, so the implications of what's going on here are probably bigger than just random viruses getting added to files.
I'm happy to scan everything again and post a list of things. I'm also willing to automate this if it would help (periodic scans and uploading a list of all questionable images to a wiki page somewhere?) Anyone have any suggestions on what to do here?
Added info:
Previous thread discussing this problem:
http://lists.wikimedia.org/pipermail/xmldatadumps-l/2012-July/000565.html
Bug filed about malicious PDFs from last time:
On Mon, Mar 11, 2013 at 11:02 AM, Kevin Day kevin@your.org wrote:
We've once again been notified that our mirror of the Wikimedia images is "hosting malware". A quick check appears to mostly be more newly uploaded PDFs with one or more exploits in them, but there are also a few other media types that seem to be similarly damaged.
I'm personally okay with ignoring it, it's not hurting us any, but ideally I'd like to see things like this get removed. Many of the infected PDFs appear to be arabic language documents that would be of interest to people critical of their government, so the implications of what's going on here are probably bigger than just random viruses getting added to files.
I'm happy to scan everything again and post a list of things. I'm also willing to automate this if it would help (periodic scans and uploading a list of all questionable images to a wiki page somewhere?) Anyone have any suggestions on what to do here?
Kevin, dealing with the current issue, the list you provided last time was helpful so that admins could go through and delete the files. If you're able to generate that again, I think it would help.
For the longer-term issue, the WMF is not currently scanning upload with a virus scanner, because of the performance and false positive rates. It would be great if we could get a bot to scan and flag files, so we can shorten the time to removing them.
On Mar 11, 2013, at 2:10 PM, Chris Steipp csteipp@wikimedia.org wrote:
On Mon, Mar 11, 2013 at 11:02 AM, Kevin Day kevin@your.org wrote:
We've once again been notified that our mirror of the Wikimedia images is "hosting malware". A quick check appears to mostly be more newly uploaded PDFs with one or more exploits in them, but there are also a few other media types that seem to be similarly damaged.
I'm personally okay with ignoring it, it's not hurting us any, but ideally I'd like to see things like this get removed. Many of the infected PDFs appear to be arabic language documents that would be of interest to people critical of their government, so the implications of what's going on here are probably bigger than just random viruses getting added to files.
I'm happy to scan everything again and post a list of things. I'm also willing to automate this if it would help (periodic scans and uploading a list of all questionable images to a wiki page somewhere?) Anyone have any suggestions on what to do here?
Kevin, dealing with the current issue, the list you provided last time was helpful so that admins could go through and delete the files. If you're able to generate that again, I think it would help.
I'll get started on that now.
For the longer-term issue, the WMF is not currently scanning upload with a virus scanner, because of the performance and false positive rates. It would be great if we could get a bot to scan and flag files, so we can shorten the time to removing them.
If this is something that you guys don't easily have the resources to do internally, I could probably come up with something for this that runs on our end. There is a bit of delay between something being uploaded and it reaching us (I'm talking with Ariel Glenn right now on determining what the latency is), but if you're happy with a rather slow turnaround, it wouldn't be hard for me to script what I'm doing. Periodic rescanning everything is probably better than just a scan on import - I'd be very surprised if the nature of the infections I'm seeing were known by virus scanners at the time they were uploaded.
-- Kevin
On 11/03/13 20:22, Kevin Day wrote:
For the longer-term issue, the WMF is not currently scanning upload with a virus scanner, because of the performance and false positive rates. It would be great if we could get a bot to scan and flag files, so we can shorten the time to removing them.
If this is something that you guys don't easily have the resources to do internally, I could probably come up with something for this that runs
on our
end. There is a bit of delay between something being uploaded and it
reaching
us (I'm talking with Ariel Glenn right now on determining what the
latency is),
but if you're happy with a rather slow turnaround, it wouldn't be hard
for me
to script what I'm doing. Periodic rescanning everything is probably
better
than just a scan on import - I'd be very surprised if the nature of the infections I'm seeing were known by virus scanners at the time they
were uploaded.
-- Kevin
I don't think it would really be a problem. I can run try to run something on labs. Ben installed a swift instance in labs 12 months ago, but it is probably better to download directly, as swift is too greedy for this.
I don't see how having the scan down elsewhere impacts on false positives. As for performance, phooey. That's a red herring, simply have stand-alone scanners that copy what they want to scan and report back.
xmldatadumps-l@lists.wikimedia.org