[WikiEN-l] Image tagging: 33 months later

Mark Wagner carnildo at gmail.com
Tue Dec 23 05:53:17 UTC 2008


Back in March of 2006, I did a check of image uploading.  The results
were, to put it bluntly, appalling.

I've re-done the check with a new batch of 1,945 images.  This covers
a little over two days' uploading, where the original set was 1,866
images uploaded in a little over 24 hours.

For 1,945 images uploaded and not later deleted, 1,960 license tags
were applied.

858 images, or 44%, were tagged with a non-free content tag, up from
40% in 2006. with album covers and logos making up slightly more than
half.  The vast numbers of promotional photos that were uploaded in
2006 are nowhere to be seen: only 20 images were so tagged.

At least 917 images (47%) were tagged with a free-content license tag,
up from 41% in 2006.  The most popular tags are PD-Self (334 images),
GFDL (250 images), and Creative Commons Attribution-Sharealike (221
images)

Only 176 images (9%) did not have a license tag, a vast improvement
over 2006, when 26% were untagged.

500 of the images were checked for tag correctness.  Things are
looking *much* better than they were in March 2006: of the 494 tags
applied, 35 (7%) were clearly incorrect, and 34 invalid fair-use
claims were made.  In 2006, the error rates were 22% incorrect and 16%
invalid fair-use claims.

The most-misused tag by count is the self-creation tag (at least 21
images were not self-created), with the GFDL/CC-BY-SA-3.0 dual-license
tag especially problematic.  By proportion, it's CC-BY-3.0 (5 out of
12 incorrect).

On the non-free content side of things, the problematic tags are
{{non-free television screenshot}} (6 out of 10 used to illustrate a
person's biography), {{non-free audio sample}} (3 out of 4 samples
were over-long), and {{non-free promotional}} (2 out of 3 images were
clearly replaceable).  As before, album covers and logos tended to be
used correctly (74 out of 84 and 46 out of 57, respectively).

28 out of 254 free-content tags were incorrect, compared to 7 out of
205 non-free-content tags.  Breaking non-free content down by type of
media and getting rid of the generic "fair use" tags ("promotional",
"fair use", etc.) seems to have worked wonderfully.

We still need to do something about people uploading images with
incorrect information, but it's far less of a problem than it used to
be.

-- 
Mark
[[User:Carnildo]]



More information about the WikiEN-l mailing list