Ok, so some of the commons license templates are more solid than others, but the file you
refer to used license Template:Flickr-no known copyright restrictions [1]
<https://commons.wikimedia.org/wiki/Template:Flickr-no_known_copyright_restrictions>
, The deletion of the template was discussed to death
here<https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Templa…
[2], but there was no consensus. It would be good to have a list of such templates. A
query searching for templates used by files in that directory which transclude {{License
template tag}} should do it, but I do not think I can create it with a CatScan3 tool. We
do have ~1.5k license
templates<https://commons.wikimedia.org/wiki/User:Jarekt/f>
[3] (that number includes customized templates build from more generic ones), and some of
them are very rarely used, so it would be good to look at them again.
About asking uploaders to add infoboxes. This idea come from 2 things: desire to get more
people involved and uploaders are often interested in improving their files, and desire to
simplify life of bot writers. I do not think it is possible to write a bot to get it
always right. For example
https://commons.wikimedia.org/wiki/File:AJ_3101_ant.jpg file
just says {{GFDL}} and what is in the image and there is no information about who took the
picture or even if the uploader thought the subject of the photo was GFDL or the
photograph itself. Same with
https://commons.wikimedia.org/wiki/File:Ajokoirat.png I do
not know if it is a GFDL because it was copied from a website claiming GFDL or because
author who upload it chose that license. By the way those files definitely do not meet
current standards but in 2006 they were not unusual. If any of those guys are still around
it would be nice if they could clean it up, because we can not guess those things.
Jarek T.
(user:Jarekt)
[1]
https://commons.wikimedia.org/wiki/Template:Flickr-no_known_copyright_restr…
[2]
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Template:Flick…
[3]
https://commons.wikimedia.org/wiki/User:Jarekt/f
-----Original Message-----
From: Guillaume Paumier [mailto:gpaumier@wikimedia.org]
Sent: Friday, December 12, 2014 3:57 PM
To: commons-l(a)lists.wikimedia.org
Cc: Tuszynski, Jarek W.; Coordination of technology deployments across
languages/projects
Subject: Re: [Commons-l] File metadata cleanup drive: We now have numbers for Commons
Hi,
Thank you for sharing your thoughts, Jarek :)
Le vendredi 12 décembre 2014, 03:44:54 Tuszynski, Jarek W. a écrit :
So all the files in Category:Files with no
machine-readable
ine-r eadable_license> need work to be done with
licenses, not files.
I do not know what machine-readable metadata is needed
but I can help
with adding them.
Yes, many of those are tricky because there isn't necessarily a "real"
license attached to them (example:
https://commons.wikimedia.org/wiki/File:
%22A_Basket_full_of_Wool%22_(6360159381).jpg ) or the license isn't specific enough.
There are similar discussions at
https://meta.wikimedia.org/wiki/Talk:File_metadata_cleanup_drive#How_to_han…
and
https://meta.wikimedia.org/wiki/Talk:File_metadata_cleanup_drive#.22Presume…
and the best we might be able to do is to come up with a list of such cases and ask our
wonderful lawyers how to handle them :)
2) Your number of files missing machine-readable
metadata on Commons:
~533,000, seems a bit low. According to
l:Mos
tTranscludedPages> there are 24,136,218 files with
licenses ({{License
template
}), and 23,452,741 files with infobox templates
({{Information}} or
{{Infobox template
}}, so I would expect 683,477 files without any
infobox templates.
There are currently ~677,674 files* without any of the following templates:
'Information','Painting', 'Blason-fr-en',
'Blason-fr-en-it', 'Blason-xx', 'COAInformation',
'Artwork', 'Art_Photo','Photograph', 'Book',
'Map', 'Musical_work', 'Specimen'
If this list in incomplete (it probably is) or incorrect, let me know.
*Source:
https://tools.wmflabs.org/mrmetadata/commons_list.txt (warning, 18MB text
file).
But some of those do have machine-readable metadata picked up by CommonsMetadata even if
they don't have an infobox, which brings the number down to ~533,000. It can be that
they have templates we're not listing yet, or that they have MR metadata in their EXIF
data. Some of the latter are false positives, per
https://phabricator.wikimedia.org/T73719
3) As I mentioned on
Commons:Bots/Work_requests#An_example_pattern<https://commons.wikimedi
a.org
/wiki/Commons:Bots/Work_requests#An_example_pattern> I would
like to first give the original uploaders a chance to
fix the files.
We can do that by writing a standard message, which
without any threat
of deletion, ask for help with bringing their files up
to current
standards.
I'm not opposed to this in principle, but I'm not sure I see the value. We're
not going to delete files, or change attribution, or anything like that; we're only
going to take the existing information and put it into a template so it's easier to
access.
My assumption is that most uploaders wouldn't care about such a change in formatting,
and that it would entail more work for them to figure out how to do it themselves, than
for a few bot owners to do it on a wider scale.
Is this assumption unreasonable?
4) At some point I started adding such files to
[[Category:Media
missing infobox
obox_
template>]] for better tracking and started
sub-categorizing them into
a. Files with OTRS
b. Files with {{information}} template which have
some parsing issues
c. Files with {{PD-Art}} which should use
{{Artwork}} template and
where the name of the uploader, upload date, and even
source might not
be relevant
d. Files using PD license, like PD-old (except
PD-Author or PD-User):
for those files it might also the name of the
uploader, upload date,
and even source might not be relevant
It might be easier to add infoboxes for different
groups of files. For
example Magnus'
p> tool does not work well for artworks. We also
seem to have users
that specialize in different subjects and it might be
easier to get
their attention with smaller groups of files of one
type.
Thank you for doing this! I think these will be great starting points for specific bot
runs :)
--
Guillaume Paumier