As a followup to my earlier check of image licensing, I wrote a script to evaluate how accurately the license tags are being applied to images. It pops up an image, the description text, a list of templates, and a list of pages using the image, and asks how the templates apply to the image. It gives options of "is correct", "is incorrect", "makes an incorrect fair-use claim", and "is not a license template". For templates other than self-creation templates, I classified it as "incorrect" if the template clearly did not apply to the image (ie. "no rights reserved" on something where the source site says "all rights reserved", "logo" on something that isn't a logo) or if there was not enough information to evaluate the correctness (ie. unsourced images). An image was considered "not fair use" if a quick check showed that it violated anything in the "policy" section of [[Wikipedia:Fair use]], or if it was similar to anything in the "counterexamples" section. Self-creation templates were considered "incorrect" if the image didn't look self-created to me, and the description text was not sufficient to convince me of self-creation (ie. sports figures in action poses, landscape images with borders, thumbnail images in general).
After wading through 500 of the 1866 images evaluated earlier, I've got some results.
Brief summary: We are in *deep* shit.
Most misused tag: {{NoRightsReserved}}. Only two of 31 images so tagged seem to be correct. CopyrightedFreeUse (0 for 2) and PD-ineligible (0 for 5) are also well up there, but the small sample size may be misleading.
Most misused fair-use claim: {{magazinecover}}, with 20 out of 21 images being misused (Sports Illustrated covers being used to provide images of sports figures), while the 21st image was incorrectly tagged. {{Videotapecover}} (2 for 3, used to illustrate porn stars) and {{film-screenshot}} (5 for 8, used to illustrate actors) are runners-up. {{tv-screenshot}} is also a problem, with 11 of 16 images being used to illustrate episode lists.
Untagged images: 67 of 500 images had no templates whatsoever on them. Less than half that number had been tagged as "no source" or "no license". I get the feeling that the image-tagging project is falling behind here.
Most accurately-used tag: {{logo}}, with 33 of 40 images correctly tagged and used. The rest consisted mostly of unused logo images. Album covers also did fairly well, with 10 out of 14 being used only to illustrate an article on the album. Less-common tags also did well for correctness: {{PD-user}}, {{PD-because}}, and the Creative Commons tags were all invariably used correctly, but in small numbers.
Self-creation tags are something of a problems: 11 of 64 GFDL-self tags were clearly incorrect, as were 7 of 48 PD-self tags.
Overall, 239 tags were applied correctly, 90 were applied incorrectly, and 65 fair-use claims were clearly invalid.
Observations from this effort that may be applicable for attempts at getting the image situation under control: 1) Uploaders are more interested in getting images into articles than they are in doing the right thing. Sometimes an incorrect tag is clearly the result of good-faith ignorance, but more often it's the result of not giving a damn. 2) Checking images is a very time-consuming process. Even with software assistance in finding images and highlighting the most important information, and only doing the most cursory checks, I can only check around 150 images an hour. I'd estimate that if I were to do a proper job of checking, I could do around 80 images an hour. The English Wikipedia has somewhere over 250,000 images right now, and around 2000 are being uploaded every day. 3) People don't understand "fair use". The vast majority of uploaders think "fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything. 4) Tags that are not in the upload menu are almost always used correctly.
-- Mark [[User:Carnildo]]