Guillaume,
A few thoughts on file metadata cleanup drive:
1)
https://tools.wmflabs.org/mrmetadata/commons/commons/index.html and Category:Files
with no machine-readable
license<https://commons.wikimedia.org/wiki/Category:Files_with_no_machin…
show ~78k files without machine-readable license. Couple years back we had a big push to
make sure that all files on commons have licenses and we managed to fix all the files
without them (they were mostly files where license was lost somehow or was not using one
of the standard templates. Ever since we have a bot checking the database from time to
time and adding files without license to Category:Media without a license: needs history
check<https://commons.wikimedia.org/wiki/Category:Media_without_a_licens…ck>.
New uploads get {{no license}} template and have a week to add one and old uploads, which
likely lost a license somehow are processed manually. There are 29 files there now, all
the other files on Commons do have a license or are tagged with {{no license}} or similar
template. So all the files in Category:Files with no machine-readable
license<https://commons.wikimedia.org/wiki/Category:Files_with_no_machin…
need work to be done with licenses, not files. I do not know what machine-readable
metadata is needed but I can help with adding them.
2) Your number of files missing machine-readable metadata on Commons: ~533,000,
seems a bit low. According to
Special:MostTranscludedPages<https://commons.wikimedia.org/wiki/Special:…
there are 24,136,218 files with licenses ({{License template
tag<https://commons.wikimedia.org/wiki/Template:License_template_tag>…);}}), and
23,452,741 files with infobox templates ({{Information}} or {{Infobox template
tag<https://commons.wikimedia.org/wiki/Template:Infobox_template_tag>…t;}}, so I
would expect 683,477 files without any infobox templates.
3) As I mentioned on
Commons:Bots/Work_requests#An_example_pattern<https://commons.wikimedia.…
I would like to first give the original uploaders a chance to fix the files. We can do
that by writing a standard message, which without any threat of deletion, ask for help
with bringing their files up to current standards. We should have one message per uploader
with a list of all the files that need infoboxes. We should also advise them on the use of
VisualFileChange gadget or requesting specific tasks to be done by bots at
Commons:Bots/Work requests. VisualFileChange gadget by user:Rillke does have an option
“Prepend text, notify uploaders” which does almost what I need (one message per uploader),
but I would prefer a python code.
4) At some point I started adding such files to [[Category:Media missing infobox
template<https://commons.wikimedia.org/wiki/Category:Media_missing_infob…
for better tracking and started sub-categorizing them into
a. Files with OTRS
b. Files with {{information}} template which have some parsing issues
c. Files with {{PD-Art}} which should use {{Artwork}} template and where the name of
the uploader, upload date, and even source might not be relevant
d. Files using PD license, like PD-old (except PD-Author or PD-User): for those files
it might also the name of the uploader, upload date, and even source might not be
relevant
It might be easier to add infoboxes for different groups of files. For example Magnus'
add_information.php<http://toolserver.org/%7Emagnus/add_information.php&… tool does
not work well for artworks. We also seem to have users that specialize in different
subjects and it might be easier to get their attention with smaller groups of files of one
type.
Jarek T.
(user:Jarekt)
-----Original Message-----
From: commons-l-bounces(a)lists.wikimedia.org [mailto:commons-l-bounces@lists.wikimedia.org]
On Behalf Of Guillaume Paumier
Sent: Thursday, December 11, 2014 2:16 PM
To: Coordination of technology deployments across languages/projects; Wikimedia Commons
Discussion List
Subject: [Commons-l] File metadata cleanup drive: We now have numbers for Commons
Greetings,
As many of you are aware, we're currently in the process of collectively adding
machine-readable metadata to many files and templates that don't have them, both on
Commons and on all other Wikimedia wikis with local uploads [1,2]. This makes it much
easier to see and re-use multimedia files consistently with best practices for attribution
across a variety of channels (offline, PDF exports, mobile platforms, MediaViewer,
WikiWand, etc.)
In October, I created a dashboard to track how many files were missing the
machine-readable markers on each wiki [3]. Unfortunately, due to the size of Commons, I
needed to find another way to count them there.
Yesterday, I finished to implement the script for Commons, and started to run it. As of
today, we have accurate numbers for the quantity of files missing machine-readable
metadata on Commons: ~533,000, out of
~24 million [4]. It may seem like a lot, but I personally think it's a great testament
to the dedication of the Commons community.
Now that we have numbers, we can work on going through those files and fixing them. Many
of them are missing the {{information}} template, but many of those are also part of a
batch: either they were uploaded by the same user, or they were mass-uploaded by a bot. In
either case, this makes it easier to parse the information and add the {{information}}
template automatically with a bot, thus avoiding painful manual work.
I invite you to take a look at the list of files at
https://tools.wmflabs.org/mrmetadata/commons/commons/index.html and see if you can find
such groups and patterns.
Once you identify a pattern, you're encouraged to add a section to the Bot Requests
page on Commons, so that a bot owner can fix them:
https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Adding_the_In…
I believe we can make a lot of progress rapidly if we dive into the list of files and fix
all the groups we can find. The list and statistics will be updated daily so it'll be
easy to see our progress.
Let me know if you'd like to help but are unsure how!
[1]
https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive
[2]
https://blog.wikimedia.org/2014/11/07/cleaning-up-file-metadata-for-humans-…
[3]
https://tools.wmflabs.org/mrmetadata/
[4]
https://tools.wmflabs.org/mrmetadata/commons/commons/index.html
--
Guillaume Paumier
_______________________________________________
Commons-l mailing list
Commons-l@lists.wikimedia.org<mailto:Commons-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/commons-l