Hi,
Thank you for sharing your thoughts, Jarek :)
Le vendredi 12 décembre 2014, 03:44:54 Tuszynski, Jarek W. a écrit :
So all the files in Category:Files with no
machine-readable
license<https://commons.wikimedia.org/wiki/Category:Files_with_no_machin…
eadable_license> need work to be done with licenses, not files. I do not
know what machine-readable metadata is needed but I can help with adding
them.
Yes, many of those are tricky because there isn't necessarily a "real"
license
attached to them (example:
https://commons.wikimedia.org/wiki/File:
%22A_Basket_full_of_Wool%22_(6360159381).jpg ) or the license isn't specific
enough.
There are similar discussions at
https://meta.wikimedia.org/wiki/Talk:File_metadata_cleanup_drive#How_to_han…
and
https://meta.wikimedia.org/wiki/Talk:File_metadata_cleanup_drive#.22Presume…
and the best we might be able to do is to come up with a list of such cases
and ask our wonderful lawyers how to handle them :)
2) Your number of files missing machine-readable
metadata on Commons:
~533,000, seems a bit low. According to
Special:MostTranscludedPages<https://commons.wikimedia.org/wiki/Special:…
tTranscludedPages> there are 24,136,218 files with licenses ({{License
template
tag<https://commons.wikimedia.org/wiki/Template:License_template_tag>…);}}),
and 23,452,741 files with infobox templates ({{Information}} or {{Infobox
template
tag<https://commons.wikimedia.org/wiki/Template:Infobox_template_tag>…t;}},
so I would expect 683,477 files without any infobox templates.
There are currently ~677,674 files* without any of the following templates:
'Information','Painting', 'Blason-fr-en',
'Blason-fr-en-it', 'Blason-xx',
'COAInformation', 'Artwork', 'Art_Photo','Photograph',
'Book', 'Map',
'Musical_work', 'Specimen'
If this list in incomplete (it probably is) or incorrect, let me know.
*Source:
https://tools.wmflabs.org/mrmetadata/commons_list.txt (warning, 18MB
text file).
But some of those do have machine-readable metadata picked up by
CommonsMetadata even if they don't have an infobox, which brings the number
down to ~533,000. It can be that they have templates we're not listing yet, or
that they have MR metadata in their EXIF data. Some of the latter are false
positives, per
https://phabricator.wikimedia.org/T73719
3) As I mentioned on
Commons:Bots/Work_requests#An_example_pattern<https://commons.wikimedia.…
/wiki/Commons:Bots/Work_requests#An_example_pattern> I would like to first
give the original uploaders a chance to fix the files. We can do that by
writing a standard message, which without any threat of deletion, ask for
help with bringing their files up to current standards.
I'm not opposed to this in principle, but I'm not sure I see the value. We're
not going to delete files, or change attribution, or anything like that; we're
only going to take the existing information and put it into a template so it's
easier to access.
My assumption is that most uploaders wouldn't care about such a change in
formatting, and that it would entail more work for them to figure out how to do
it themselves, than for a few bot owners to do it on a wider scale.
Is this assumption unreasonable?
4) At some point I started adding such files to
[[Category:Media
missing infobox
template<https://commons.wikimedia.org/wiki/Category:Media_missing_infob…
template>]] for better tracking and started sub-categorizing them into
a. Files with OTRS
b. Files with {{information}} template which have some parsing issues
c. Files with {{PD-Art}} which should use {{Artwork}} template and
where the name of the uploader, upload date, and even source might not be
relevant
d. Files using PD license, like PD-old (except
PD-Author or PD-User):
for those files it might also the name of the uploader, upload date, and
even source might not be relevant
It might be easier to add infoboxes for different
groups of files. For
example Magnus'
add_information.php<http://toolserver.org/%7Emagnus/add_information.php&…
tool does not work well for artworks. We also seem to have users that
specialize in different subjects and it might be easier to get their
attention with smaller groups of files of one type.
Thank you for doing this! I think these will be great starting points for
specific bot runs :)
--
Guillaume Paumier