Hi,
A quick follow-up: We now have good numbers for the amount of files missing machine-readable metadata on Commons (spoiler: it's about ~500,000 files, meaning ~98% of files have machine-readable markers). Below is a copy of the update I sent to commons-l:
https://lists.wikimedia.org/pipermail/commons-l/2014-December/007431.html
---------- Forwarded message ---------- From: Guillaume Paumier gpaumier@wikimedia.org Date: Thu, Dec 11, 2014 at 8:16 PM Subject: File metadata cleanup drive: We now have numbers for Commons To: Coordination of technology deployments across languages/projects wikitech-ambassadors@lists.wikimedia.org, Wikimedia Commons Discussion List commons-l@lists.wikimedia.org
Greetings,
As many of you are aware, we're currently in the process of collectively adding machine-readable metadata to many files and templates that don't have them, both on Commons and on all other Wikimedia wikis with local uploads [1,2]. This makes it much easier to see and re-use multimedia files consistently with best practices for attribution across a variety of channels (offline, PDF exports, mobile platforms, MediaViewer, WikiWand, etc.)
In October, I created a dashboard to track how many files were missing the machine-readable markers on each wiki [3]. Unfortunately, due to the size of Commons, I needed to find another way to count them there.
Yesterday, I finished to implement the script for Commons, and started to run it. As of today, we have accurate numbers for the quantity of files missing machine-readable metadata on Commons: ~533,000, out of ~24 million [4]. It may seem like a lot, but I personally think it's a great testament to the dedication of the Commons community.
Now that we have numbers, we can work on going through those files and fixing them. Many of them are missing the {{information}} template, but many of those are also part of a batch: either they were uploaded by the same user, or they were mass-uploaded by a bot. In either case, this makes it easier to parse the information and add the {{information}} template automatically with a bot, thus avoiding painful manual work.
I invite you to take a look at the list of files at https://tools.wmflabs.org/mrmetadata/commons/commons/index.html and see if you can find such groups and patterns.
Once you identify a pattern, you're encouraged to add a section to the Bot Requests page on Commons, so that a bot owner can fix them: https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Adding_the_Inf...
I believe we can make a lot of progress rapidly if we dive into the list of files and fix all the groups we can find. The list and statistics will be updated daily so it'll be easy to see our progress.
Let me know if you'd like to help but are unsure how!
[1] https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive [2] https://blog.wikimedia.org/2014/11/07/cleaning-up-file-metadata-for-humans-a... [3] https://tools.wmflabs.org/mrmetadata/ [4] https://tools.wmflabs.org/mrmetadata/commons/commons/index.html