About a month ago I had sent a mail[0] about my GSoC project on porting catimages to pywikibot-core with my mentors DrTrigon[1] and jayvdb[2]. As a step towards that, we've made a library to analyze files on commons using exif data and computer vision techniques which will be used in the bot. We recently released a v0.1.0.

Currently, the library is able to identify mimetypes, detect barcodes, detect faces, read exif data, and measure the average color. You can read more about the library at User:AbdealiJK/file-metadata[3]. It contains installation instructions and also a simple script using pywikibot which can be used to analyze files on commons.

We've been running the library on a number of files (35,000+) on commons to test for corner cases and to check it's validity. You can find the logs of that analysis on https://commons.wikimedia.org/wiki/User:AbdealiJKTravis/logs . There are some discrepancies which have been seen, and it would be great to hear your comments on it.

It would also be immensely helpful if users can install the library and test it out. If any problems arise, please make an issue at the bug tracker[4] on github or on the Talk page so that we can help you out and also make the library more robust.

Abdeali JK


[0] - https://lists.wikimedia.org/pipermail/commons-l/2016-May/007740.html
[1] - https://commons.wikimedia.org/wiki/User:DrTrigon
[2] - https://commons.wikimedia.org/wiki/User:Jayvdb
[3] - https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata
[4] - https://github.com/AbdealiJK/file-metadata/issues