As a followup to my earlier check of image licensing, I wrote a script to evaluate how accurately the license tags are being applied to images. It pops up an image, the description text, a list of templates, and a list of pages using the image, and asks how the templates apply to the image. It gives options of "is correct", "is incorrect", "makes an incorrect fair-use claim", and "is not a license template". For templates other than self-creation templates, I classified it as "incorrect" if the template clearly did not apply to the image (ie. "no rights reserved" on something where the source site says "all rights reserved", "logo" on something that isn't a logo) or if there was not enough information to evaluate the correctness (ie. unsourced images). An image was considered "not fair use" if a quick check showed that it violated anything in the "policy" section of [[Wikipedia:Fair use]], or if it was similar to anything in the "counterexamples" section. Self-creation templates were considered "incorrect" if the image didn't look self-created to me, and the description text was not sufficient to convince me of self-creation (ie. sports figures in action poses, landscape images with borders, thumbnail images in general).
After wading through 500 of the 1866 images evaluated earlier, I've got some results.
Brief summary: We are in *deep* shit.
Most misused tag: {{NoRightsReserved}}. Only two of 31 images so tagged seem to be correct. CopyrightedFreeUse (0 for 2) and PD-ineligible (0 for 5) are also well up there, but the small sample size may be misleading.
Most misused fair-use claim: {{magazinecover}}, with 20 out of 21 images being misused (Sports Illustrated covers being used to provide images of sports figures), while the 21st image was incorrectly tagged. {{Videotapecover}} (2 for 3, used to illustrate porn stars) and {{film-screenshot}} (5 for 8, used to illustrate actors) are runners-up. {{tv-screenshot}} is also a problem, with 11 of 16 images being used to illustrate episode lists.
Untagged images: 67 of 500 images had no templates whatsoever on them. Less than half that number had been tagged as "no source" or "no license". I get the feeling that the image-tagging project is falling behind here.
Most accurately-used tag: {{logo}}, with 33 of 40 images correctly tagged and used. The rest consisted mostly of unused logo images. Album covers also did fairly well, with 10 out of 14 being used only to illustrate an article on the album. Less-common tags also did well for correctness: {{PD-user}}, {{PD-because}}, and the Creative Commons tags were all invariably used correctly, but in small numbers.
Self-creation tags are something of a problems: 11 of 64 GFDL-self tags were clearly incorrect, as were 7 of 48 PD-self tags.
Overall, 239 tags were applied correctly, 90 were applied incorrectly, and 65 fair-use claims were clearly invalid.
Observations from this effort that may be applicable for attempts at getting the image situation under control: 1) Uploaders are more interested in getting images into articles than they are in doing the right thing. Sometimes an incorrect tag is clearly the result of good-faith ignorance, but more often it's the result of not giving a damn. 2) Checking images is a very time-consuming process. Even with software assistance in finding images and highlighting the most important information, and only doing the most cursory checks, I can only check around 150 images an hour. I'd estimate that if I were to do a proper job of checking, I could do around 80 images an hour. The English Wikipedia has somewhere over 250,000 images right now, and around 2000 are being uploaded every day. 3) People don't understand "fair use". The vast majority of uploaders think "fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything. 4) Tags that are not in the upload menu are almost always used correctly.
-- Mark [[User:Carnildo]]
On 3/10/06, Mark Wagner carnildo@gmail.com wrote:
- People don't understand "fair use". The vast majority of uploaders think
"fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything.
I agree. Is there a way to provide a very quick summary to anyone using this tag?
Also, a possible technical solution: force the uploader of a fair-use image to specify which article it's to be used for, and prevent it being used in other articles. Can a fair use image often be legitimately used in different articles?
Steve
Steve Bennett wrote:
On 3/10/06, Mark Wagner carnildo@gmail.com wrote:
- People don't understand "fair use". The vast majority of uploaders think
"fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything.
I agree. Is there a way to provide a very quick summary to anyone using this tag?
Also, a possible technical solution: force the uploader of a fair-use image to specify which article it's to be used for, and prevent it being used in other articles. Can a fair use image often be legitimately used in different articles?
Steve
Occasionally, yes. It would piss me off if I had to upload several different copies of the same image to illustrate articles like [[Ketuanan Melayu]] (the headliner image is used on about half a dozen other articles), especially since I think I actually understand fair use. :p
John
Perhaps this fictitious mechanism could allow you to specify more than one article?
Steve
On 3/10/06, John Lee johnleemk@gawab.com wrote:
Steve Bennett wrote:
On 3/10/06, Mark Wagner carnildo@gmail.com wrote:
- People don't understand "fair use". The vast majority of uploaders think
"fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything.
I agree. Is there a way to provide a very quick summary to anyone using this tag?
Also, a possible technical solution: force the uploader of a fair-use image to specify which article it's to be used for, and prevent it being used in other articles. Can a fair use image often be legitimately used in different articles?
Steve
Occasionally, yes. It would piss me off if I had to upload several different copies of the same image to illustrate articles like [[Ketuanan Melayu]] (the headliner image is used on about half a dozen other articles), especially since I think I actually understand fair use. :p
John _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
Steve Bennett wrote:
Perhaps this fictitious mechanism could allow you to specify more than one article?
Steve
That wouldn't really solve the problem, I think. Right now the problem is most people upload fair use images to be used as illustrations in just one or two articles, but the fair use justification here is nearly nil. That's what we should be focusing on -- cases of people abusing a fair use image are rare, AFAIK.
John
On 3/10/06, Steve Bennett stevage@gmail.com wrote:
On 3/10/06, Mark Wagner carnildo@gmail.com wrote:
- People don't understand "fair use". The vast majority of uploaders think
"fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything.
I agree. Is there a way to provide a very quick summary to anyone using this tag?
See my point #1: the average uploader doesn't care. A summary such as "If the owner of this image doesn't agree that it's fair use, I agree to be sued for copyright infringement" might work. I don't think anything less blunt would get the message across.
-- Mark [[User:Carnildo]]
On 3/10/06, Mark Wagner carnildo@gmail.com wrote:
See my point #1: the average uploader doesn't care. A summary such as "If the owner of this image doesn't agree that it's fair use, I agree to be sued for copyright infringement" might work. I don't think anything less blunt would get the message across.
I'd have rather strong reservations about anything so blunt - after all, fair use is a copyright infringement which is permissible under US law in certain specific cases. The issue of the owner objecting is secondary - there are people who would happily agree to Wikipedia using their material but not to downstream users, but that is incompatible with the aims of the project. On the other hand, something might be legitimately fair use and still draw objections from the copyright owner.
That said, I agree that stronger language might not be a problem for ALL media uploads.
Ian
On 3/10/06, Mark Wagner carnildo@gmail.com wrote:
See my point #1: the average uploader doesn't care. A summary such as "If the owner of this image doesn't agree that it's fair use, I agree to be sued for copyright infringement" might work. I don't think anything less blunt would get the message across.
Ok, assume some *do* care, can't we come up with some summary like "for fair use, you have to be actually talking about the image itself - not just the subject depicted in the image" or something? IANAL. I have to admit that fair use has always been a bit of a mystery to me as well - particularly in a university context it often gets interpreted as something like "don't take too much, and don't do it too publicly, and you'll be right".
Steve
Perhaps new users could be forced to view a brief tutorial about fair use, et al before being allowed to upload images.
On 3/10/06, Rob gamaliel8@gmail.com wrote:
Perhaps new users could be forced to view a brief tutorial about fair use, et al before being allowed to upload images.
That might work for the 10% who care about copyright. For the other 90%, it's just one more thing to click through: they don't care, they just want their pretty pictures.
-- Mark [[User:Carnildo]]
On 3/10/06, Steve Bennett stevage@gmail.com wrote:
Ok, assume some *do* care, can't we come up with some summary like "for fair use, you have to be actually talking about the image itself
- not just the subject depicted in the image" or something?
Even that is not exactly true. Fair use is a very grey area. There are fair use uses outside criticism and commentary; it's just that those are some of the more clear-cut ones, and some of the ones that a commercial re-user of our content have more chance of also being able to use in defense.
-Matt
Steve Bennett wrote:
On 3/10/06, Mark Wagner carnildo@gmail.com wrote:
- People don't understand "fair use". The vast majority of uploaders think
"fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything.
I agree. Is there a way to provide a very quick summary to anyone using this tag?
I think all the tags link to the relevant policy pages. But being a wiki that allows anybody to upload, I don't think it's plausible to operate under the assumption that every uploader understands all about fair use (much of which is murky even to lawyers) even after reading those pages, any more than we assume that there are no vandals, everybody knows how to spell, etc - not that I would mind prospective editors having to pass a spelling test before being allowed to touch articles. :-)
As long as this is a wiki, we are going to have bad uploads. The problem is that our image review process needs more time from more volunteers, and/or needs to be made more efficient somehow.
Also, a possible technical solution: force the uploader of a fair-use image to specify which article it's to be used for, and prevent it being used in other articles. Can a fair use image often be legitimately used in different articles?
That's the idea behind "fairusein" and its multi-argument friends. It doesn't however ensure the addition of a meaningful rationale.
Stan
Mark Wagner wrote:
Untagged images: 67 of 500 images had no templates whatsoever on them. Less than half that number had been tagged as "no source" or "no license". I get the feeling that the image-tagging project is falling behind here.
We're still catching up! People are crunching lists going back to last summer, not what was uploaded yesterday. 67 untagged per day is easily dealt with in an hour, if those are the only images needing tagging. Another proposal is to autoschedule new untagged uploads for deletion.
- People don't understand "fair use". The vast majority of uploaders think
"fair use" means "I think it's fair that this Wikipedia article should be illustrated", while those who are aware that it's part of copyright law tend to have mistaken ideas, such as "educational use" allows anything.
Here's a thought - "fair use" images need to go through a review process *before* uploading, sort of like featured articles. If the "fair use image candidate" gets through the process, then it can be recorded and we don't have to fight about it anymore.
- Tags that are not in the upload menu are almost always used correctly.
Certainly an argument for dropping the menu...
Stan
On 3/10/06, Stan Shebs shebs@apple.com wrote:
Mark Wagner wrote:
Here's a thought - "fair use" images need to go through a review process *before* uploading, sort of like featured articles. If the "fair use image candidate" gets through the process, then it can be recorded and we don't have to fight about it anymore.
I have reservations about this. Fair use is a grey area legally - it's a permissible violation of copyright. If we had a formal review process, might that not make the project more liable if we get it wrong? In addition, if we make fair use more difficult, then we are likely to get people tagging images with free tags. If they are tagged as fair use, a lot of obvious violations show up. On the other hand, if it's mis-tagged as {{GFDL}}, we'd really have to go to the source.
I would rather make it a requirement that the uploader list a source or the name of the copyright holder. Lots of images don't have a source clearly marked, and don't have information about the copyright holder. How can we even hope to verify licensing if we don't know who holds the copyright (or at worst, where it came from)
Ian
Guettarda wrote:
On 3/10/06, Stan Shebs shebs@apple.com wrote:
Mark Wagner wrote:
Here's a thought - "fair use" images need to go through a review process *before* uploading, sort of like featured articles. If the "fair use image candidate" gets through the process, then it can be recorded and we don't have to fight about it anymore.
I have reservations about this. Fair use is a grey area legally - it's a permissible violation of copyright. If we had a formal review process, might that not make the project more liable if we get it wrong?
So, uh, the situation is to pretend the problem doesn't exist?
In addition, if we make fair use more difficult, then we are likely to get people tagging images with free tags. If they are tagged as fair use, a lot of obvious violations show up. On the other hand, if it's mis-tagged as {{GFDL}}, we'd really have to go to the source.
That's absolutely correct. As much as I hate to say it, I'm beginning to think that the only way to solve this is to thwack all clueless people WRT fair use with a clue stick, and to just continue reviewing images as we stumble upon them. Reviewing them at the point of entry will just encourage people to lie on purpose instead of just unconsciously make a mistake.
I would rather make it a requirement that the uploader list a source or the name of the copyright holder. Lots of images don't have a source clearly marked, and don't have information about the copyright holder. How can we even hope to verify licensing if we don't know who holds the copyright (or at worst, where it came from)
Ian
That's a very good idea. We should make a specific, mandatory field for this. Of course, we then run the risk of people making up sources and copyright holder names...
John
Guettarda wrote:
On 3/10/06, Stan Shebs shebs@apple.com wrote:
Mark Wagner wrote:
Here's a thought - "fair use" images need to go through a review process *before* uploading, sort of like featured articles. If the "fair use image candidate" gets through the process, then it can be recorded and we don't have to fight about it anymore.
I have reservations about this. Fair use is a grey area legally - it's a permissible violation of copyright. If we had a formal review process, might that not make the project more liable if we get it wrong? In addition, if we make fair use more difficult, then we are likely to get people tagging images with free tags. If they are tagged as fair use, a lot of obvious violations show up. On the other hand, if it's mis-tagged as {{GFDL}}, we'd really have to go to the source.
I'm not sure that review changes liability, it can be pointed to as a good-faith effort to do things correctly. All the empirical evidence so far is that we're much more picky about fair use than just about everybody else online, after all nobody seems to be shutting down the giant celebrity galleries from which many of our images are copied. WP's angst about fair use is internally generated.
On mistagging, that's a good point. We see that on commons daily, where people upload a movie poster and tag it GFDL, as if it wasn't going to be a blindingly obvious copyvio. I suppose the disincentive would be the certain knowledge of severe sanctions if one is caught,
I would rather make it a requirement that the uploader list a source or the name of the copyright holder. Lots of images don't have a source clearly marked, and don't have information about the copyright holder. How can we even hope to verify licensing if we don't know who holds the copyright (or at worst, where it came from)
But then what's to keep someone from lying about the source any less than about license?
Stan
Stan Shebs wrote:
I'm not sure that review changes liability, it can be pointed to as a good-faith effort to do things correctly. All the empirical evidence so far is that we're much more picky about fair use than just about everybody else online, after all nobody seems to be shutting down the giant celebrity galleries from which many of our images are copied. WP's angst about fair use is internally generated.
I think the angst is generated with concern to commercial reusers. The line that has been discussed with me, or at least my understanding of that line, is that Wikipedia's value to commercial reusers drops if they have to evaluate all fair use images. What value there is to wikipedia in having commercial reusers isn't, now I comne to think of it, fully explained. If we're releasing this stuff GFDL then we can't be getting paid by commercial reusers, can we? Or do we instead receive generous donations from commercial reusers?
Steve block
On 3/10/06, Stan Shebs shebs@apple.com wrote:
Guettarda wrote:
I would rather make it a requirement that the uploader list a source or
the
name of the copyright holder. Lots of images don't have a source clearly marked, and don't have information about the copyright holder. How can
we
even hope to verify licensing if we don't know who holds the copyright
(or
at worst, where it came from)
But then what's to keep someone from lying about the source any less than about license?
Stan
Of course it won't solve all of our problems, but it might make matters easier. While some people don't care about copyright, in my experience a lot of people are just clueless. Most of the untagged (or obviously mistagged) images I come across were uploaded by newbies - sometime all they have done it upload a pile of images and add them to aricles (full sized, and the top of the page...) They generally are people who mean well - I guess they just figure that we are the clueless ones who "don't know that you can get images of X everywhere on the web" or don't know about Google image search or... These people are often apologetic when you explain the situation to them.
The other thing I come across is people who upload images that are viable fair use candidates, but they don't know anything about sourcing the images. Recently someone uploaded an image of the new head of the ruling Jamaican political party (and next PM of Jamaica). When asked about licensing, he didn't know. Since he provided the source, which was the party website, it seemed reasonable to make a case for {{publicity}}. If, on the other hand, it had come from the Jamaica Gleaner, it would not be something we can use. Since he provided the source, it was possible to determine copyright status.
What's more important though is when someone tags an image with a free license. If you force them to provide a source and they provide a false one, it may be possible to check the website to see what it says with regards to copyright. Especially if it's {{GFDL}} - AFAICT, if an image is GFDL, and you don't identify the creator, you are violating the license. Identifying and credit image creators strikes me as important in and of itself, let alone from the practical viewpoint of being able to verify the accuracy of tags
Ian
Guettarda wrote:
Of course it won't solve all of our problems, but it might make matters easier. While some people don't care about copyright, in my experience a lot of people are just clueless. Most of the untagged (or obviously mistagged) images I come across were uploaded by newbies - sometime all they have done it upload a pile of images and add them to aricles (full sized, and the top of the page...) They generally are people who mean well - I guess they just figure that we are the clueless ones who "don't know that you can get images of X everywhere on the web" or don't know about Google image search or... These people are often apologetic when you explain the situation to them.
Now there's an idea: maybe we should just add a big notice to the upload page saying "Yes, we know there are lots of images you can download off the web. We don't want them." and in a smaller font "(For exceptions, see [[Wikipedia:Fair use]] and [[Wikipedia:Free image resources]].)".
On 3/10/06, Ilmari Karonen nospam@vyznev.net wrote:
Now there's an idea: maybe we should just add a big notice to the upload page saying "Yes, we know there are lots of images you can download off the web. We don't want them." and in a smaller font "(For exceptions, see [[Wikipedia:Fair use]] and [[Wikipedia:Free image resources]].)".
-- Ilmari Karonen
Tempting but not overkill enough. I'd like secting a fair use tag take you to a screen with the criteria for useing that tag on it in big letters.
-- geni
geni wrote:
On 3/10/06, Ilmari Karonen nospam@vyznev.net wrote:
Now there's an idea: maybe we should just add a big notice to the upload page saying "Yes, we know there are lots of images you can download off the web. We don't want them." and in a smaller font "(For exceptions, see [[Wikipedia:Fair use]] and [[Wikipedia:Free image resources]].)".
Tempting but not overkill enough. I'd like secting a fair use tag take you to a screen with the criteria for useing that tag on it in big letters.
The big bold notice is there now, let's see if it helps any. I'm also fully in favor of your proposal, should someone wish to implement it.
On 3/10/06, Stan Shebs shebs@apple.com wrote:
Mark Wagner wrote:
Untagged images: 67 of 500 images had no templates whatsoever on them. Less than half that number had been tagged as "no source" or "no license". I get the feeling that the image-tagging project is falling behind here.
We're still catching up! People are crunching lists going back to last summer, not what was uploaded yesterday. 67 untagged per day is easily dealt with in an hour, if those are the only images needing tagging. Another proposal is to autoschedule new untagged uploads for deletion.
There were 1866 images uploaded on that day. I only checked about quarter of them, so it's actually around 250-300 untagged images. I could certainly modify OrphanBot to tag images with a blank image description page as "no source", but if there's content on the image description page (which is true for most of them), it takes a human to tell the difference between {{no license}} and {{GFDL-presumed}}.
- Tags that are not in the upload menu are almost always used correctly.
Certainly an argument for dropping the menu...
I don't know about that. If you want to use something that's not in the menu, it means two things: (1) you care enough to want the image to be tagged correctly, and (2) you know enough to realize that the tag you want isn't in the menu. Getting rid of the menu won't make this true for more people.
-- Mark [[User:Carnildo]]
Quoting Mark Wagner carnildo@gmail.com:
I could certainly modify OrphanBot to tag images with a blank image description page as "no source", but if there's content on the image
Even if almost all images have something on the description page, the above still sounds like a useful thing to do for the ones that do not.
Jkelly
Mark Wagner wrote:
On 3/10/06, Stan Shebs shebs@apple.com wrote:
Mark Wagner wrote:
Untagged images: 67 of 500 images had no templates whatsoever on them. Less than half that number had been tagged as "no source" or "no license". I get the feeling that the image-tagging project is falling behind here.
We're still catching up! People are crunching lists going back to last summer, not what was uploaded yesterday. 67 untagged per day is easily dealt with in an hour, if those are the only images needing tagging. Another proposal is to autoschedule new untagged uploads for deletion.
There were 1866 images uploaded on that day. I only checked about quarter of them, so it's actually around 250-300 untagged images. I could certainly modify OrphanBot to tag images with a blank image description page as "no source", but if there's content on the image description page (which is true for most of them), it takes a human to tell the difference between {{no license}} and {{GFDL-presumed}}.
I'd think you'd want a new tag {{untagged}} (yes, lots of self-referential hilarity potential), so that you're not expressing an opinion on any information already on the page. And if the bot drops a note on the talk page within a day, then many of the new but legit uploaders can go back and do the fixup work themselves.
Stan
Mark Wagner wrote:
Brief summary: We are in *deep* shit.
The misapprehensions regarding fair use may have been festering quite some time. To put the cat amongst the pigeons: [[Image:AtlasAward.jpg]].
Since fair use images are not allowable in user space, and barnstars and awards are given to people on their user page, it's kind of interesting to see this precedent set. There's brief discussion on the fair use issue at the bottom of [[User talk:Pedant/2004-11-19]].
It seems to be an old enough image and use that some discussion might be warranted.
Steve block