> >
> > Message: 4
> > Date: Thu, 4 Dec 2014 14:58:37 -0500
> > From: "Sreejith K." <sreejithk2000@gmail.com>
> > To: Wikimedia Commons Discussion List <commons-l@lists.wikimedia.org>
> > Subject: Re: [Commons-l] Duplicate removal?
> > Message-ID:
> >         <CAN8yy7Mtte+FPJ5N=hq=rQC3onOq5Vvtcixzt+mZ2kxfDAcdKQ@mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > I am using Wikimedia APIs to create a gallery of duplicates and routinely
> > clean them. You can see the results here.
> >
> > https://commons.wikimedia.org/wiki/User:Sreejithk2000/Duplicates
> >
> > The page also has a link to the script. If anyone is interested in using
> > this script, let me know and I can work with you to customize it.
> >
> > - Sreejith K.
> >
> >
>
See also https://commons.wikimedia.org/wiki/Special:ListDuplicatedFiles which lists files that have the most byte for byte duplicates (really most of the time those should use file redirects).

--

Thanks Jonas for experimenting with this sort of thing. I always wished we did something with preceptual hashes internally in addition to the sha1 hashes we do currently.

--bawolff