Would it be possible to split the list into images that are
* byte-for-byte identical
* very different sizes (eg > x2 difference -- this is often intentional,
especially for large tiffs).
* others ?
I think this would be useful.
It would also be useful to do some further processing to identify images
which, though probably related, are *not* in fact duplicates, eg due to
a notable difference somewhere (eg arrows or legend added, or a
difference in some local blocks of colour, eg:
https://commons.wikimedia.org/wiki/File:Map_-_NL_-_Putten_-_Wijk_00_Putten_…
https://commons.wikimedia.org/wiki/File:Map_-_NL_-_Putten_-_Wijk_00_Putten_…
-- James.
On 04/12/2014 09:44, Jonas Öberg wrote:
Hi Federico, and others,
Are most of the case you find perfect duplicates
like these?
I'm still running the comparison, but I made a first list of ~500
duplicate works available here:
http://belar.coyote.org/~jonas/wmcdups.html
It would be very useful to get some feedback on that. Looking through
some of those will give an idea of the kind of "duplicates" we find.
Sincerely,
Jonas
_______________________________________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l