>
> Message: 4
> Date: Thu, 4 Dec 2014 14:58:37 -0500
> From: "Sreejith K." <sreejithk2000(a)gmail.com>
> To: Wikimedia Commons Discussion List <commons-l(a)lists.wikimedia.org>
> Subject: Re: [Commons-l] Duplicate removal?
> Message-ID:
> <CAN8yy7Mtte+FPJ5N=hq=
rQC3onOq5Vvtcixzt+mZ2kxfDAcdKQ(a)mail.gmail.com>
> Content-Type: text/plain;
charset="utf-8"
>
> I am using Wikimedia APIs to create a gallery of duplicates and
routinely
clean them.
You can see the results here.
https://commons.wikimedia.org/wiki/User:Sreejithk2000/Duplicates
The page also has a link to the script. If anyone is interested in using
this script, let me know and I can work with you to customize it.
- Sreejith K.
See also
https://commons.wikimedia.org/wiki/Special:ListDuplicatedFiles
which lists files that have the most byte for byte duplicates (really most
of the time those should use file redirects).
--
Thanks Jonas for experimenting with this sort of thing. I always wished we
did something with preceptual hashes internally in addition to the sha1
hashes we do currently.
--bawolff