On Thu, Dec 11, 2014 at 1:16 PM, Guillaume Paumier gpaumier@wikimedia.org wrote:
Yesterday, I finished to implement the script for Commons, and started to run it. As of today, we have accurate numbers for the quantity of files missing machine-readable metadata on Commons: ~533,000, out of ~24 million [4]. It may seem like a lot, but I personally think it's a great testament to the dedication of the Commons community.
Wonderful. Thanks!
Now that we have numbers, we can work on going through those files and fixing them. Many of them are missing the {{information}} template, but many of those are also part of a batch: either they were uploaded by the same user, or they were mass-uploaded by a bot. In either case, this makes it easier to parse the information and add the {{information}} template automatically with a bot, thus avoiding painful manual work.
I've been poking at all of this with a stick in my free time, and it's true that a good number of these images are part of a set of images and the patterns are readily apparent. Magnus's No Information tool on labs is enormously helpful for retrieving these pattern sets since it's searchable by file name or the user/bot who uploaded the images[1]. I highly recommend it.
Once you identify a pattern, you're encouraged to add a section to the Bot Requests page on Commons, so that a bot owner can fix them:
https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Adding_the_Inf...
Challenge accepted[2].
1. https://tools.wmflabs.org/add-information/no_information.php?language=common... 2. https://commons.wikimedia.org/w/index.php?title=Commons:Bots/Work_requests&a...