Samuel Klein, 06/02/2014 23:39:
Are we doing any commons analysis like this at the moment? Is any similarity-analysis done on upload to help uploaders identify copies of the same image that already exist online? Or to flag potential copyvios for reviewers?
I'm sure TinEye would be glad to give us high-volume API access to enable that sort of cross-referencing.
Would they? It's something we really need a lot and that we should do for all uploads everywhere to save our patrollers a lot of precious time, but it always looked impossible. 1) If WMF is interested in helping it would be useful to know. Even getting access to the existing search API key is a quest no hero is known to have successfully completed despite repeated attempts. https://wikitech.wikimedia.org/wiki/Web_search If it's possible to avoid institutional bottlenecks completely that would also be useful to know. 2) We don't even know what percentage of Wikimedia Commons images are included in TinEye and at what speed. Does someone manage to extract this information from them?
As Fae says, good part of the work is integrating the results in the patrollers' (and uploaders'?) workflow in a sensible way. Embedding it in UploadWizard may be too much, but a "simple" bot which just places a tag on suspicious images can be made into an extension too, if preferred to a mere pywikibot script. If the two premises above are positive, it should be included in https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Wikimedia_Commons_.2F_multimedia: GSoC is approaching!
Nemo