On Thu, Dec 4, 2014 at 10:08 AM, James Heald <j.heald(a)ucl.ac.uk> wrote:
As well as exact duplicates, there may often also be
different versions of
the same painting with different lighting, or scans of slightly different
reproductions of the same work. I don't know whether the algorithm is
permissive enough to pick all of these up, but as many as can be picked up
would be good to tag as "other versions" of the same underlying image.
In general, we probably wouldn't *remove* duplicate images, but we would
want to identify them as versions of each other.
We probably need a good definition
of all these terms, because people
tend to have different interpretations of a 'duplicate'. E.g., for me
a lower quality reproduction of a painting is a duplicate, but other
people on Commons define it more strictly: only 'downsized' versions
of a reproduction (that could also be made using the thumbnail
service) are considered to be duplicates. We also need to have
definitions for things like details, alternate angles, etcetera.