> >
> > Message: 4
> > Date: Thu, 4 Dec 2014 14:58:37 -0500
> > From: "Sreejith K." <sreejithk2000@gmail.com>
> > To: Wikimedia Commons Discussion List <commons-l@lists.wikimedia.org>
> > Subject: Re: [Commons-l] Duplicate removal?
> > Message-ID:
> > <CAN8yy7Mtte+FPJ5N=hq=rQC3onOq5Vvtcixzt+mZ2kxfDAcdKQ@mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > I am using Wikimedia APIs to create a gallery of duplicates and routinely
> > clean them. You can see the results here.
> >
> > https://commons.wikimedia.org/wiki/User:Sreejithk2000/Duplicates
> >
> > The page also has a link to the script. If anyone is interested in using
> > this script, let me know and I can work with you to customize it.
> >
> > - Sreejith K.
> >
> >
>
See also https://commons.wikimedia.org/wiki/Special:ListDuplicatedFiles which lists files that have the most byte for byte duplicates (really most of the time those should use file redirects).

Thanks Jonas for experimenting with this sort of thing. I always wished we did something with preceptual hashes internally in addition to the sha1 hashes we do currently.

--bawolff