[Foundation-l] Code detecting bots?

Gregory Maxwell gmaxwell at gmail.com
Thu Aug 2 17:24:57 UTC 2007


On 8/2/07, David Gerard <dgerard at gmail.com> wrote:
> Really? I thought we ran "file" on uploads as well as looking at the extension.

We do. And if it doesn't match what we think it will be... we put a
notice that no one notices on the image page.

Example:
http://commons.wikimedia.org/wiki/Image:Edvard_Grieg_-_03_-_In_The_Hall_Of_The_Mountain_King.ogg

(that file is an mp3 renamed to ogg, normally I just transcode all
these but I've left that one sitting around because the copyright
status on it looked suspect)

There is a constant slow stream of misnamed files that come in....
every few months I hit one and get a wild hair to go convert or delete
all the ones on commons and enwiki.  I've found some weird stuff, even
suspicious, but not yet something which I am confident was malicious.

[[User:Gmaxwell]] gets bored and takes a pushbroom it it once a year
is not a scalable method of handling this stuff.

> Though I suppose that wouldn't protect against the "specially crafted
> malicious file" of security notice fame.

Even if we had the more aggressive filtering that you thought we had
the risk of such files would remain.  For the most part the handlers
for the formats we do support tend to be pretty robust, internet grade
stuff though..



More information about the foundation-l mailing list