On 12/5/06, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:
Someone came to me with an image they believed they
obtained from
commons but where unsure of exactly where they found it...
I was eventually able to locate the image, but it took a lot of work.
Had the image been deleted for copyright problems it would have been
nearly impossible but it is in exactly that sort of situation which we
may need the ability to find the image the most.
I have locally a database of image fingerprints (quantized color
histograms) which could be used to locate images... it didn't work for
this case because the image was newer than our last image backup
(which was last year). It might be useful but it's not a complete
solution.
A few hours ago, I wrote about adding an MD5 hash (or the like) to
each image entry in the database. That would have helped finding the
image in question as well, except if it has been altered.
I wanted to get input from the community on a couple
of actions we
might like to take to improve the situation in the future:
# On upload we could attach the URL the image was uploaded as to the
image in an EXIF tag. There are a great many EXIF tags defined and
I'm sure we could find a fitting one. This would only work for .JPG
but it would be easy to implement.
That could be done as part of the upload process. If we eventually
enable copy-from-web (again, some code of mine deactivated for unknown
reasons; next time, I'll set these things to "on" by default, so the
gods in charge can't ignore it forever like they do now) we could also
include the "original" (pre-commons) URL.
As a separate topic, we should
consider adding license data to our exif tags (I do it for my images,
but we should perhaps do it more generally. This would be fairly
easy to do and I don't think this would be controversial, although
there would be some complexity with respect to image moves once we
gain that ability in the future. Does anyone object to this?
# We could also add the same in the PNG comments... although such use
of png comments is non-standard .. I don't think it would break
anything. Anyone have any thoughts on that?
# We could add some RDF tags to SVGs for the same purpose, although I
think the PNG rasterizations of SVGs would be more important.
Adding licenses to the image will require changing the image on any
license-altereing edit to the description page. It also means we need
to parse said description for license tags. Unless, of course, we
limit this function to the license set on the upload page.
That said, I think either is a good idea.
# Finally, something that might be somewhat
controversial: I think
it would be a good idea to add some text to the (raster) thumbnail
image on the image page. My idea is that we would add an extra white
area below the image large enough to contain a line of text which
mentions where the image came from. This would have a two fold
benefit: 1) it would encourage people to use the full resolution image
for reuse, 2) it would cause automated scraping processes which hit
our image pages to preserve human readable tracking information.
Unlike a classic watermark this addition could be removed via
cropping. Technically this would require just a few more arguments to
imagemagik during thumbnailing, but we'd have to make a few other
changes to handle smaller images and to treat the image page thumb
differently from other same-sized thumbs.
I'm not sure it's worth the effort.
1) We already link to the high-res version in the line below the
image. Altering the thumbnail requires people to edit the image if
they don't want the high-res version (maybe they're on a modem?)
2) That would be useful for automated image-scrapers that don't use
the page as well and don't link back to the commons. Do you have an
example for this?
Also, IMHO such a bar would uglify (is that a word?) most images. And,
our JPG thumbnails are JPGs as well; depending on the compression,
JPGs don't render (small) text very well.
In general I think we need to think about how to push
our metadata
into the image files themselves. Only if the metadata is embedded in
the images will downstream users have a hope of keeping track of the
images. If the rest of the world did this our lives would be much
easier, so let us do on to other as we would have others do onto us.
Agreed. But I'd also like for us to use existing data more within the
system. We already use EXIF data to categorize camera models, IIRC?
The images themselves contain data (color etc.); how about "similar
images"? (yes, I know that's a big one, just dreaming here;-)
I created a new flickr account a few days ago, and I very much like
the "feel" of it. THe whole site screams that it's designed for
images. Maybe we should think about tag/category clouds, pre-link
various image sizes, integrate mass-organization (like "show me my
images, select this and that, tag them with category XYZ"). I'm not
saying we should become flickr, but we should learn from them.
Magnus