Hi Fae,
Listing identical duplicates with 2 or more files
matching would be
simpler but longer; at the moment I count 3,279 files like this on
Commons which took over 9 minutes to run. :-)
This is very interesting. I had a closer look at our matches and it
seems that many of them are files where there are slight color
variations, or where the jpg has simply been compressed differently,
so a sha1 wouldn't mach them against each other. But that speaks in
favor of the fact that the matches we find need a human to validate
case by case. My Python script is still processing :-) but it's
currently recorded 12,475 matches, which then also includes your
3,279.
But your 3,279 should be fairly uncomplicated to do something about it
seems, though perhaps there too it needs a human to assist since the
metadata and use may vary?
Sincerely,
Jonas