John Resig has just published some excellent data analysis combining TinEye, image archives, and image clustering and deduplication to identify identical and similar images across a large corpus.
http://ejohn.org/research/computer-vision-photo-archives/
Are we doing any commons analysis like this at the moment? Is any similarity-analysis done on upload to help uploaders identify copies of the same image that already exist online? Or to flag potential copyvios for reviewers?
I'm sure TinEye would be glad to give us high-volume API access to enable that sort of cross-referencing.
SJ