[WikiEN-l] [WikiEn-l] Image tagging followup

Mark Wagner carnildo at gmail.com
Fri Mar 10 08:13:35 UTC 2006


As a followup to my earlier check of image licensing, I wrote a script to
evaluate how accurately the license tags are being applied to images.  It
pops up an image, the description text, a list of templates, and a list of
pages using the image, and asks how the templates apply to the image.  It
gives options of "is correct", "is incorrect", "makes an incorrect fair-use
claim", and "is not a license template".  For templates other than
self-creation templates, I classified it as "incorrect" if the template
clearly did not apply to the image (ie. "no rights reserved" on something
where the source site says "all rights reserved", "logo" on something that
isn't a logo) or if there was not enough information to evaluate the
correctness (ie. unsourced images).  An image was considered "not fair use"
if a quick check showed that it violated anything in the "policy" section of
[[Wikipedia:Fair use]], or if it was similar to anything in the
"counterexamples" section.  Self-creation templates were considered
"incorrect" if the image didn't look self-created to me, and the description
text was not sufficient to convince me of self-creation (ie. sports figures
in action poses, landscape images with borders, thumbnail images in
general).

After wading through 500 of the 1866 images evaluated earlier, I've got some
results.

Brief summary: We are in *deep* shit.

Most misused tag: {{NoRightsReserved}}.  Only two of 31 images so tagged
seem to be correct.  CopyrightedFreeUse (0 for 2) and PD-ineligible (0 for
5) are also well up there, but the small sample size may be misleading.

Most misused fair-use claim: {{magazinecover}}, with 20 out of 21 images
being misused (Sports Illustrated covers being used to provide images of
sports figures), while the 21st image was incorrectly tagged.
{{Videotapecover}} (2 for 3, used to illustrate porn stars) and
{{film-screenshot}} (5 for 8, used to illustrate actors) are runners-up.
{{tv-screenshot}} is also a problem, with 11 of 16 images being used to
illustrate episode lists.

Untagged images: 67 of 500 images had no templates whatsoever on them.  Less
than half that number had been tagged as "no source" or "no license".  I get
the feeling that the image-tagging project is falling behind here.

Most accurately-used tag: {{logo}}, with 33 of 40 images correctly tagged
and used.  The rest consisted mostly of unused logo images.  Album covers
also did fairly well, with 10 out of 14 being used only to illustrate an
article on the album.  Less-common tags also did well for correctness:
{{PD-user}}, {{PD-because}}, and the Creative Commons tags were all
invariably used correctly, but in small numbers.

Self-creation tags are something of a problems: 11 of 64 GFDL-self tags were
clearly incorrect, as were 7 of 48 PD-self tags.

Overall, 239 tags were applied correctly, 90 were applied incorrectly, and
65 fair-use claims were clearly invalid.

Observations from this effort that may be applicable for attempts at getting
the image situation under control:
1) Uploaders are more interested in getting images into articles than they
are in doing the right thing.  Sometimes an incorrect tag is clearly the
result of good-faith ignorance, but more often it's the result of not giving
a damn.
2) Checking images is a very time-consuming process.  Even with software
assistance in finding images and highlighting the most important
information, and only doing the most cursory checks, I can only check around
150 images an hour.  I'd estimate that if I were to do a proper job of
checking, I could do around 80 images an hour.  The English Wikipedia has
somewhere over 250,000 images right now, and around 2000 are being uploaded
every day.
3) People don't understand "fair use".  The vast majority of uploaders think
"fair use" means "I think it's fair that this Wikipedia article should be
illustrated", while those who are aware that it's part of copyright law tend
to have mistaken ideas, such as "educational use" allows anything.
4) Tags that are not in the upload menu are almost always used correctly.

--
Mark
[[User:Carnildo]]



More information about the WikiEN-l mailing list