On Thu, Dec 11, 2014 at 1:16 PM, Guillaume Paumier <gpaumier@wikimedia.org> wrote:

Yesterday, I finished to implement the script for Commons, and started
to run it. As of today, we have accurate numbers for the quantity of
files missing machine-readable metadata on Commons: ~533,000, out of
~24 million [4]. It may seem like a lot, but I personally think it's a
great testament to the dedication of the Commons community.

Wonderful. Thanks!

Now that we have numbers, we can work on going through those files and
fixing them. Many of them are missing the {{information}} template,
but many of those are also part of a batch: either they were uploaded
by the same user, or they were mass-uploaded by a bot. In either case,
this makes it easier to parse the information and add the
{{information}} template automatically with a bot, thus avoiding
painful manual work.

I've been poking at all of this with a stick in my free time, and it's true that a good number of these images are part of a set of images and the patterns are readily apparent. Magnus's No Information tool on labs is enormously helpful for retrieving these pattern sets since it's searchable by file name or the user/bot who uploaded the images[1]. I highly recommend it.

Once you identify a pattern, you're encouraged to add a section to the
Bot Requests page on Commons, so that a bot owner can fix them:
https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Adding_the_Information_template_to_files_that_don.27t_have_it

Challenge accepted[2].

1. https://tools.wmflabs.org/add-information/no_information.php?language=commons&project=wikimedia&startswith=&user=

2. https://commons.wikimedia.org/w/index.php?title=Commons:Bots/Work_requests&diff=prev&oldid=142316425

Keegan Peterzell

Community Liaison, Product

Wikimedia Foundation