Message: 4 Date: Thu, 4 Dec 2014 14:58:37 -0500 From: "Sreejith K." sreejithk2000@gmail.com To: Wikimedia Commons Discussion List commons-l@lists.wikimedia.org Subject: Re: [Commons-l] Duplicate removal? Message-ID: <CAN8yy7Mtte+FPJ5N=hq=
rQC3onOq5Vvtcixzt+mZ2kxfDAcdKQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
I am using Wikimedia APIs to create a gallery of duplicates and
routinely
clean them. You can see the results here.
https://commons.wikimedia.org/wiki/User:Sreejithk2000/Duplicates
The page also has a link to the script. If anyone is interested in using this script, let me know and I can work with you to customize it.
- Sreejith K.
See also https://commons.wikimedia.org/wiki/Special:ListDuplicatedFiles which lists files that have the most byte for byte duplicates (really most of the time those should use file redirects).
--
Thanks Jonas for experimenting with this sort of thing. I always wished we did something with preceptual hashes internally in addition to the sha1 hashes we do currently.
--bawolff