On Mar 11, 2013, at 2:10 PM, Chris Steipp <csteipp(a)wikimedia.org> wrote:
On Mon, Mar 11, 2013 at 11:02 AM, Kevin Day
<kevin(a)your.org> wrote:
We've once again been notified that our mirror of the Wikimedia images is
"hosting malware". A quick check appears to mostly be more newly uploaded PDFs
with one or more exploits in them, but there are also a few other media types that seem to
be similarly damaged.
I'm personally okay with ignoring it, it's not hurting us any, but ideally
I'd like to see things like this get removed. Many of the infected PDFs appear to be
arabic language documents that would be of interest to people critical of their
government, so the implications of what's going on here are probably bigger than just
random viruses getting added to files.
I'm happy to scan everything again and post a list of things. I'm also willing to
automate this if it would help (periodic scans and uploading a list of all questionable
images to a wiki page somewhere?) Anyone have any suggestions on what to do here?
Kevin, dealing with the current issue, the list you provided last time
was helpful so that admins could go through and delete the files. If
you're able to generate that again, I think it would help.
I'll get started on that now.
For the longer-term issue, the WMF is not currently
scanning upload
with a virus scanner, because of the performance and false positive
rates. It would be great if we could get a bot to scan and flag files,
so we can shorten the time to removing them.
If this is something that you guys don't easily have the resources to do internally, I
could probably come up with something for this that runs on our end. There is a bit of
delay between something being uploaded and it reaching us (I'm talking with Ariel
Glenn right now on determining what the latency is), but if you're happy with a rather
slow turnaround, it wouldn't be hard for me to script what I'm doing. Periodic
rescanning everything is probably better than just a scan on import - I'd be very
surprised if the nature of the infections I'm seeing were known by virus scanners at
the time they were uploaded.
-- Kevin