About a month ago I had sent a mail about my GSoC project on porting
catimages to pywikibot-core with my mentors DrTrigon and jayvdb. As a
step towards that, we've made a library to analyze files on commons using
exif data and computer vision techniques which will be used in the bot. We
recently released a v0.1.0.
Currently, the library is able to identify mimetypes, detect barcodes,
detect faces, read exif data, and measure the average color. You can read
more about the library at User:AbdealiJK/file-metadata. It contains
installation instructions and also a simple script using pywikibot which
can be used to analyze files on commons.
We've been running the library on a number of files (35,000+) on commons to
test for corner cases and to check it's validity. You can find the logs of
that analysis on
. There are
some discrepancies which have been seen, and it would be great to hear your
comments on it.
It would also be immensely helpful if users can install the library and
test it out. If any problems arise, please make an issue at the bug
tracker on github or on the Talk page so that we can help you out and
also make the library more robust.
 - https://lists.wikimedia.org/pipermail/commons-l/2016-May/007740.html
 - https://commons.wikimedia.org/wiki/User:DrTrigon
 - https://commons.wikimedia.org/wiki/User:Jayvdb
 - https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata
 - https://github.com/AbdealiJK/file-metadata/issues