Re: [Commons-l] Duplicate removal?

4 Dec 2014

We really need a better way to mark duplicates on Commons (and images 
that are details from a larger work).  A structure to record this is 
something that probably ought to be on the radar for the new Structured 
Data project.

As well as exact duplicates, there may often also be different versions 
of the same painting with different lighting, or scans of slightly 
different reproductions of the same work.  I don't know whether the 
algorithm is permissive enough to pick all of these up, but as many as 
can be picked up would be good to tag as "other versions" of the same 
underlying image.

In general, we probably wouldn't *remove* duplicate images, but we would 
want to identify them as versions of each other.

All best,

    James.

On 04/12/2014 08:25, Federico Leva (Nemo) wrote:
...
  Jonas Öberg, 04/12/2014 08:31:
  In our work with Elog.io[1], we've come
across a number of duplicate
 files in Commons. 
 Great!

  Some of them are explainable, such as PNGs which
 also have a thumbnail as JPG[2], but others seem to be more clear-cut
 duplicated uploads, like [3] and [4], and yet others are the same work
 but different sizes like [5] and [6]. 
 Are most of the case you find perfect duplicates like these?

 Going through this is quite an effort, and likely requires a bit of
 manual work. Is there any organised structure/group of people, that
 deal with duplicate works? We'd love to contribute our findings to
 such an effort once we clean up our data a bit. 
 Sure. You can edit the files and add
 https://commons.wikimedia.org/wiki/Template:Duplicate
 If you need to report many thousands files, it may be better to use a
 flagged bot account:
 https://commons.wikimedia.org/wiki/Commons:Bots/Requests

 Nemo

 _______________________________________________
 Commons-l mailing list
 Commons-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/commons-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Commons-l] Duplicate removal?