We really need a better way to mark duplicates on Commons (and images
that are details from a larger work). A structure to record this is
something that probably ought to be on the radar for the new Structured
Data project.
As well as exact duplicates, there may often also be different versions
of the same painting with different lighting, or scans of slightly
different reproductions of the same work. I don't know whether the
algorithm is permissive enough to pick all of these up, but as many as
can be picked up would be good to tag as "other versions" of the same
underlying image.
In general, we probably wouldn't *remove* duplicate images, but we would
want to identify them as versions of each other.
All best,
James.
On 04/12/2014 08:25, Federico Leva (Nemo) wrote:
Jonas Öberg, 04/12/2014 08:31:
In our work with Elog.io[1], we've come
across a number of duplicate
files in Commons.
Great!
Some of them are explainable, such as PNGs which
also have a thumbnail as JPG[2], but others seem to be more clear-cut
duplicated uploads, like [3] and [4], and yet others are the same work
but different sizes like [5] and [6].
Are most of the case you find perfect duplicates like these?
Going through this is quite an effort, and likely requires a bit of
manual work. Is there any organised structure/group of people, that
deal with duplicate works? We'd love to contribute our findings to
such an effort once we clean up our data a bit.
Sure. You can edit the files and add
https://commons.wikimedia.org/wiki/Template:Duplicate
If you need to report many thousands files, it may be better to use a
flagged bot account:
https://commons.wikimedia.org/wiki/Commons:Bots/Requests
Nemo
_______________________________________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l