Re: [Commons-l] Duplicate removal?

4 Dec 2014

Hi Fae,

...
  Listing identical duplicates with 2 or more files
matching would be
 simpler but longer; at the moment I count 3,279 files like this on
 Commons which took over 9 minutes to run. :-) 
This is very interesting. I had a closer look at our matches and it
seems that many of them are files where there are slight color
variations, or where the jpg has simply been compressed differently,
so a sha1 wouldn't mach them against each other. But that speaks in
favor of the fact that the matches we find need a human to validate
case by case. My Python script is still processing :-) but it's
currently recorded 12,475 matches, which then also includes your
3,279.

But your 3,279 should be fairly uncomplicated to do something about it
seems, though perhaps there too it needs a human to assist since the
metadata and use may vary?

Sincerely,
Jonas

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Commons-l] Duplicate removal?