It would appear that we are breaking the law with respect to copyright at download.wikimedia.org.
This is a result of our policy about the insertion of copyrighted material into articles:
[[Wikipedia:Copyright problems#Instructions]] "Pages where the most recent edit is a copyright violation, but the previous article was not, should not be deleted. They should be reverted. The violating text will remain in the page history for archival reasons unless the copyright holder asks the Wikimedia Foundation to remove it."
As a result our database contains large quantities of violating material. Because this material is completely untagged (just looks like a normal revert in most cases), someone who wanted to redistribute the database without substantial liability would be unable.
It is unfortunate that we will not be able to always find and remove every copyright violation, but when we instruct our editors to sweep violations they discover under the rug, our actions could easily be construed as willful infringement.
At a minimum we should instruct editors to tag reverts with a uniform tag. In the case where the copyvio was only in the most recent version it would be fairly trivial (though computationally expensive) to sweep the DB and prune revisions that were copyvio. In cases where there were other edits after the copyvio there may be no automatic way to remove the violating text... but at least we should be tagging these changes.
Gregory Maxwell wrote in gmane.science.linguistics.wikipedia.misc:
It would appear that we are breaking the law with respect to copyright at download.wikimedia.org.
this is not particular to downloads, but applies to any place that old revisions are available, including the web interface.
This is a result of our policy about the insertion of copyrighted material into articles:
[[Wikipedia:Copyright problems#Instructions]] "Pages where the most recent edit is a copyright violation, but the previous article was not, should not be deleted. They should be reverted. The violating text will remain in the page history for archival reasons unless the copyright holder asks the Wikimedia Foundation to remove it."
does this policy apply everywhere or only en.wp?
As a result our database contains large quantities of violating material.
my understand of the relevant US law is that we are not required to aggressively remove copyright violations until requested by the copyright holder.
kate.
Kate Turner (keturner@livejournal.com) [050530 08:37]:
Gregory Maxwell wrote in gmane.science.linguistics.wikipedia.misc:
As a result our database contains large quantities of violating material.
my understand of the relevant US law is that we are not required to aggressively remove copyright violations until requested by the copyright holder.
If someone ever codes the facility to zap old revisions easily, we could deal with such without great pain. The main barrier at present is that zapping old revs entails deleting the article then restoring all revs except the offender. On a heavily-trafficked page this present obvious logistical problems.
Our proactive approach, though, does stand us in good stead should we ever end up in a courtroom. Generally the copyvio page is *rabid* and that's good. Even if on many occasions the author of a given text has to point out they're the editor that added it to Wikipedia ;-)
- d.
Interestingly enough, the only time I pilfered text from an essay I wrote (for something other than WP) that is widely available, and used it on WP, nobody ever caught it and introduced the possibility of a copyvio - it still stands.
Mark
On 29/05/05, David Gerard fun@thingy.apana.org.au wrote:
Kate Turner (keturner@livejournal.com) [050530 08:37]:
Gregory Maxwell wrote in gmane.science.linguistics.wikipedia.misc:
As a result our database contains large quantities of violating material.
my understand of the relevant US law is that we are not required to aggressively remove copyright violations until requested by the copyright holder.
If someone ever codes the facility to zap old revisions easily, we could deal with such without great pain. The main barrier at present is that zapping old revs entails deleting the article then restoring all revs except the offender. On a heavily-trafficked page this present obvious logistical problems.
Our proactive approach, though, does stand us in good stead should we ever end up in a courtroom. Generally the copyvio page is *rabid* and that's good. Even if on many occasions the author of a given text has to point out they're the editor that added it to Wikipedia ;-)
- d.
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
On Mon, 30 May 2005 09:02:12 +1000, David Gerard wrote:
Our proactive approach, though, does stand us in good stead should we ever end up in a courtroom. Generally the copyvio page is *rabid* and that's good. Even if on many occasions the author of a given text has to point out they're the editor that added it to Wikipedia ;-)
Based on my experience in patrolling WP.en I think you are a tad over-optimistic. What you see on WP:CP may only be the tip of the iceberg.
I find plenty of copyvios that are several days old, and I've had to undo the work of weeks or months because some people just copy and paste stuff from other web sites (no, not WP mirrors) without being caught. That is a very unpleasant experience for everyone involved, especially if other editors kept working on the text in good faith.
The longer a copyvio remains unchallenged, the harder it becomes to catch it after several copyedits changed bits and pieces of it. That doesn't fix the copyvio, though. I had a case just yesterday where my reverting a copyvio was reverted again, and editors started "refactoring" the text to "mitigate" the problem.
<rant> Some comments on [[m:Avoid Copyright Paranoia]] ain't exactly helping. </rant>
Roger
Roger Luethi wrote:
On Mon, 30 May 2005 09:02:12 +1000, David Gerard wrote:
Our proactive approach, though, does stand us in good stead should we ever end up in a courtroom. Generally the copyvio page is *rabid* and that's good. Even if on many occasions the author of a given text has to point out they're the editor that added it to Wikipedia ;-)
Based on my experience in patrolling WP.en I think you are a tad over-optimistic. What you see on WP:CP may only be the tip of the iceberg.
I find plenty of copyvios that are several days old, and I've had to undo the work of weeks or months because some people just copy and paste stuff from other web sites (no, not WP mirrors) without being caught. That is a very unpleasant experience for everyone involved, especially if other editors kept working on the text in good faith.
The longer a copyvio remains unchallenged, the harder it becomes to catch it after several copyedits changed bits and pieces of it. That doesn't fix the copyvio, though. I had a case just yesterday where my reverting a copyvio was reverted again, and editors started "refactoring" the text to "mitigate" the problem.
Refactoring may completely remove the problem, not just mitigate it. Copyright applies to the way something is expressed, and not to what is expressed. So what's the difference between a copyvio text that has been refactored, and the same refactored text being added from the bbeginning?
Some comments on [[m:Avoid Copyright Paranoia]] ain't exactly helping.
But the fundamental idea there is still an excellent rule of thumb.
Ec
On copyvios, I was surprised today to find that someone had removed the article J. C. Penney on English Wikipedia as a copyvio. The article was not a copyvio from September, 2003 until March of this year, when someone pasted in a big screed from an external site. Instead of just reverting the screed, someone deleted the whole article as a copyvio. This strikes me as very bad practice. I only noticed this because someone else started a new article, and I thought it incredible that there had not already been such an article.
On Tue, 31 May 2005 11:03:44 -0700, Ray Saintonge wrote:
Refactoring may completely remove the problem, not just mitigate it.
So I guess there is no need to ever flag a copyvio, then. Should some copyright holder ever complain, we will just change change some words in his text and the problem will go away. Well, that sure makes life a lot easier.
Copyright applies to the way something is expressed, and not to what is expressed. So what's the difference between a copyvio text that has been refactored, and the same refactored text being added from the bbeginning?
Why do we delete pages that start as copyvios, rather than refactoring them?
Roger
Roger Luethi (collector@hellgate.ch) [050601 07:39]:
So I guess there is no need to ever flag a copyvio, then. Should some copyright holder ever complain, we will just change change some words in his text and the problem will go away. Well, that sure makes life a lot easier.
Well, not quite ... but that's what a rewrite is.
Why do we delete pages that start as copyvios, rather than refactoring them?
To get the revisions out of the database while it's still easy to do so. The rewriting happens on [[pagename/temp]].
- d.
On Wed, 01 Jun 2005 08:11:23 +1000, David Gerard wrote:
Why do we delete pages that start as copyvios, rather than refactoring them?
To get the revisions out of the database while it's still easy to do so. The rewriting happens on [[pagename/temp]].
Yeah, that's what I thought. And I am also aware of the problems that keep us from completely purging copyvios from the revision history if they happen only at a later stage. It was my understanding that in those cases, we roll back and add the information back in a rewritten form.
_However_, some editors seem to think that even a rollback to the state before the copyvio is entirely unreasonable unless the copyvio was the most recent edit.
Check out [[en:Terrorism in Kashmir]] for a recent example, where many paragraphs (some 90% of the article at the time of the copyvio) were lifted literally from BBC News and yet my rollback was reverted.
I used to think we had a clear rollback policy for such cases, but the discussion here seems to indicate that I argue a minority position.
Roger
On 5/31/05, Roger Luethi collector@hellgate.ch wrote:
So I guess there is no need to ever flag a copyvio, then. Should some copyright holder ever complain, we will just change change some words in his text and the problem will go away. Well, that sure makes life a lot easier.
The biggest reason for tagging and a one week listing is review, some pages are mistakenly tagged.
It also gives time and space for copyright holders to give permision. And time to re-write the article before the offending pages are deleted
Roger
Roger Luethi wrote:
On Tue, 31 May 2005 11:03:44 -0700, Ray Saintonge wrote:
Refactoring may completely remove the problem, not just mitigate it.
So I guess there is no need to ever flag a copyvio, then. Should some copyright holder ever complain, we will just change change some words in his text and the problem will go away. Well, that sure makes life a lot easier.
Do you have statitics on how many have complained?
Copyright applies to the way something is expressed, and not to what is expressed. So what's the difference between a copyvio text that has been refactored, and the same refactored text being added from the beginning?
Why do we delete pages that start as copyvios, rather than refactoring them?
I really can't answer that, since I would never have preferred that as a first option. The important step is taking prompt action when the copyvio is identified.
Ec
On Tue, 31 May 2005 16:59:37 -0700, Ray Saintonge wrote:
So I guess there is no need to ever flag a copyvio, then. Should some copyright holder ever complain, we will just change change some words in his text and the problem will go away. Well, that sure makes life a lot easier.
Do you have statitics on how many have complained?
No, I don't. What I do have is some anecdotal evidence.
Last week, I requested that an editor ask permission from a copyright holder, and the copyright holder gave permission because she was so pleasantly surprised that for the first time ever, someone actually asked her, although the copyvio was just the most recent one of many. Not surprisingly, she didn't have kind words for the way copyvios are handled at WP.
FWIW, I did ask her about the other incidents, but didn't hear back.
Roger
On 5/31/05, Roger Luethi collector@hellgate.ch wrote:
On Mon, 30 May 2005 09:02:12 +1000, David Gerard wrote:
Our proactive approach, though, does stand us in good stead should we ever end up in a courtroom. Generally the copyvio page is *rabid* and that's good. Even if on many occasions the author of a given text has to point out they're the editor that added it to Wikipedia ;-)
Based on my experience in patrolling WP.en I think you are a tad over-optimistic. What you see on WP:CP may only be the tip of the iceberg.
I find plenty of copyvios that are several days old, and I've had to undo the work of weeks or months because some people just copy and paste stuff from other web sites (no, not WP mirrors) without being caught. That is a very unpleasant experience for everyone involved, especially if other editors kept working on the text in good faith.
The longer a copyvio remains unchallenged, the harder it becomes to catch it after several copyedits changed bits and pieces of it. That doesn't fix the copyvio, though. I had a case just yesterday where my reverting a copyvio was reverted again, and editors started "refactoring" the text to "mitigate" the problem.
<rant> Some comments on [[m:Avoid Copyright Paranoia]] ain't exactly helping. </rant>
Roger _______________________________________________
I have been informally surveying new pages, about 9% are copyvios (out of 253 checked).
On 5/31/05, Roger Luethi collector@hellgate.ch wrote:
Based on my experience in patrolling WP.en I think you are a tad over-optimistic. What you see on WP:CP may only be the tip of the iceberg.
I would agree.
I find plenty of copyvios that are several days old, and I've had to undo the work of weeks or months because some people just copy and paste stuff from other web sites (no, not WP mirrors) without being caught. That is a very unpleasant experience for everyone involved, especially if other editors kept working on the text in good faith.
The longer a copyvio remains unchallenged, the harder it becomes to catch it after several copyedits changed bits and pieces of it. That doesn't fix the copyvio, though. I had a case just yesterday where my reverting a copyvio was reverted again, and editors started "refactoring" the text to "mitigate" the problem.
My favorite en.wikipedia copyvio story: Marked an image as copyvio on the wp:cp page, two weeks later nothing was done, so I replaced the image with a cruddy sketch just to get it out of the current version... A few days later another wikipedian replaced my cruddy sketch with another drawing (the first hit for the subject on GIS at the time) and tagged as PD, saying in the summery that it came from a state university webpage and was thus PD (!!?!). The image actually came off some students home page, I contacted the student.. and the image was not PD nor was the student the copyright holder. :) ... So far, every single copyright tag I've checked out on en has been incorrect, although I've only checked out the suspicious ones, so it's not a fair assessment.
With text it's even worse because our public editing process make it much easier for someone to prove that our text was a derived work, where in a more traditional medium a sufficient amount of refactoring would usually manage to hide the violation.
With images I plan on just replacing all the ones with suspect copyright (i.e. everything that isn't CC* or GFDL and uploaded by the author, or with an actual letter attached that explicitly says PD or an acceptable license) over time... but I have no idea how to solve text.
On 5/31/05, Gregory Maxwell gmaxwell@gmail.com wrote:
With images I plan on just replacing all the ones with suspect copyright (i.e. everything that isn't CC* or GFDL and uploaded by the author, or with an actual letter attached that explicitly says PD or an acceptable license) over time...
I note that this standard is quite different from Wikipedia's current image policy.
If you, say, came across my photo album on Flickr, on which every photo has a CC-BY-SA-2.0 license attached - would you consider those images unsuitable for use on Wikipedia unless you obtained a letter from me in precise legal language? (And I note you say a LETTER, not an email. I am certain as hell not going to be writing via snail mail just to let someone use my damn photos).
Sounds like you'd also reject the use of out-of-copyright images since we couldn't get a letter stating permission.
It all seems rather extreme to me, unless this isn't what you meant.
Besides, would you trust anyone to tell the truth about even images they own, if you're getting that paranoid?
-Matt (User:Morven)
Roger Luethi wrote:
I find plenty of copyvios that are several days old, and I've had to undo the work of weeks or months because some people just copy and paste stuff from other web sites (no, not WP mirrors) without being caught. That is a very unpleasant experience for everyone involved, especially if other editors kept working on the text in good faith.
I wonder what we can do to improve the situation.
<rant> Some comments on [[m:Avoid Copyright Paranoia]] ain't exactly helping. </rant>
Indeed.
--Jimbo
Jimmy Wales (jwales@wikia.com) [050605 06:44]:
Roger Luethi wrote:
I find plenty of copyvios that are several days old, and I've had to undo the work of weeks or months because some people just copy and paste stuff from other web sites (no, not WP mirrors) without being caught. That is a very unpleasant experience for everyone involved, especially if other editors kept working on the text in good faith.
I wonder what we can do to improve the situation.
Making it easier to selectively delete old versions from the history would help a lot in making sure the stuff is not easily accessible. (I expect that's a "Great! Write it.")
- d.
On Sat, 04 Jun 2005 17:15:37 +0200, Jimmy Wales wrote:
I find plenty of copyvios that are several days old, and I've had to undo the work of weeks or months because some people just copy and paste stuff from other web sites (no, not WP mirrors) without being caught. That is a very unpleasant experience for everyone involved, especially if other editors kept working on the text in good faith.
I wonder what we can do to improve the situation.
I agree with David Gerard's suggestion, but here I will focus on how to prevent copyvios or catch them early:
* Make the copyvio warning on the edit page more visible. I notice that the German WP comes with a warning in a fat box with a red border. It is so cheap I am positive it will pay for its cost. Scroll to the bottom of this page to see how it looks like: http://de.wikipedia.org/w/index.php?title=Saint-Cloud&action=edit
* Clarify policy: WP:CP works quite well for pages that started as copyvio, at least if they are caught early. The page also gives instructions for dealing with pages "where the most recent edit is a copyright violation, but the previous article was not". However, what if a copyvio added material several months ago, and many editors kept working on the article afterwards? I say the article remains a derived work and must be reverted to the last clean state, but others disagree. Either way, there are too many conflicting opinions scattered all over WP and meta.
* Be strict: I contend that a key reason for the epidemic is that many, even experienced editors are both too lenient and too careless. Large contributions of perfect prose from unknown editors do not trigger suspicion and checks nearly as often as they should. And unlike vandalism or personal attacks, copyvios are often met with a cavalier attitude which sends the wrong message.
Having a clear policy and being strict about it would make at least the regulars more vigilant which in turn should help prevent unpleasant surprises further down the road. It would also make the task of hunting down copyvios cheaper, because arguing with unabashed copyvio apologists is a significant cost today.
I can imagine software tools more advanced then the current method of manually feeding some snippets from suspicious contributions to Google, but why bother if we don't pick the low hanging fruit as outlined above first?
Roger
On 6/6/05, Roger Luethi collector@hellgate.ch wrote:
- Make the copyvio warning on the edit page more visible. I notice that the German WP comes with a warning in a fat box with a red border. It is so cheap I am positive it will pay for its cost. Scroll to the bottom of this page to see how it looks like: http://de.wikipedia.org/w/index.php?title=Saint-Cloud&action=edit
This is great, we also need to make it a practice of asking uploaders if *they* are the copyright holder.
- Clarify policy: WP:CP works quite well for pages that started as copyvio, at least if they are caught early. The page also gives instructions for dealing with pages "where the most recent edit is a copyright violation, but the previous article was not". However, what if a copyvio added material several months ago, and many editors kept working on the article afterwards? I say the article remains a derived work and must be reverted to the last clean state, but others disagree. Either way, there are too many conflicting opinions scattered all over WP and meta.
I'm in your camp.
- Be strict: I contend that a key reason for the epidemic is that many, even experienced editors are both too lenient and too careless. Large contributions of perfect prose from unknown editors do not trigger suspicion and checks nearly as often as they should. And unlike vandalism or personal attacks, copyvios are often met with a cavalier attitude which sends the wrong message.
This is related to the above, if the cost of a violation is just clipping out some text later then people aren't likely to be strict about it. People do not account for the huge potential liabilities copyright violation places on the project, users of our content, and the cost in terms of negative publicity if we are branded a bunch of theves. Once the quality of the vast majority of our content becomes unquestionably good, the next obvious way to knock us is to say we got there via theft...
I think we need to get much more strict on material submitted by someone other than it's copyright holder. The issues are too complex for us to expect anyone to get it right, at least the "I made this" case is simple enough that we should only go afoul with bad intentioned people and complete idiots. Our community is now big enough that we can reasonably expect it to generate the vast majority of the media we need of any type.. Exceptions should obviously be granted for logos, historic images, etc but they should be handled as exceptions and not the norm. On the plus side this will get us more works which are very well targeted to our needs, rather than misapproiated stock photography which is only partially what we want.
On Mon, 06 Jun 2005 21:59:40 -0400, Gregory Maxwell wrote:
On 6/6/05, Roger Luethi collector@hellgate.ch wrote:
- Make the copyvio warning on the edit page more visible. I notice that the German WP comes with a warning in a fat box with a red border. It is so cheap I am positive it will pay for its cost. Scroll to the bottom of this page to see how it looks like: http://de.wikipedia.org/w/index.php?title=Saint-Cloud&action=edit
This is great, we also need to make it a practice of asking uploaders if *they* are the copyright holder.
I put a proposal up on [[en:MediaWiki talk:Copyrightwarning]].
Roger
Kate Turner wrote:
Gregory Maxwell wrote in gmane.science.linguistics.wikipedia.misc:
It would appear that we are breaking the law with respect to copyright at download.wikimedia.org.
this is not particular to downloads, but applies to any place that old revisions are available, including the web interface.
An apparent infringement is not always "breaking the law."
This is a result of our policy about the insertion of copyrighted material into articles:
[[Wikipedia:Copyright problems#Instructions]] "Pages where the most recent edit is a copyright violation, but the previous article was not, should not be deleted. They should be reverted. The violating text will remain in the page history for archival reasons unless the copyright holder asks the Wikimedia Foundation to remove it."
does this policy apply everywhere or only en.wp?
It seems like a reasonable policy. This is directly concerned with a core principle so I would suggest that it should be broadly applied.
As a result our database contains large quantities of violating material.
my understand of the relevant US law is that we are not required to aggressively remove copyright violations until requested by the copyright holder.
Yes and no. Technically one is required to remove material that one knows to be an infringement when one becomes aware of it by any means. However, in the absence of a take down order there is the easy defence of not knowing that the material is an infringement. "Not knowing" is a broader concept than simply being to identify a certain passage as identical to that by someone else. If we determine that to be the case we have an infringement as a question of fact, but not necessarily as a question of law. If we maintain the offending material in the current article after it has been fully identified as an infringement the likelihood that we have also an infringement in law is greater. For purposes of this line of reason we can safely assume that a claim of fair use in the current article is not available, but the standards of fair use may be considerably relaxed.
Access to the infringing material is considerably more limited, even though in theory anyone can access it. Let's be practical! Who is going to spend a lot of his time looking through often lengthy article histories looking for copyvios to plagiarize? If a copyright holder properly makes a request to take down the material from the archive we should be prepared to comply, but until that happens we have no need to be so worried about this stuff.
I agree that copyvio material should be identified, but NOT in the edit summary. That only makes it easier for someone to go through the edit histories to find it. A two-step approach may be better. In the first edit the article is tagged as a copyvio; in a new edit the offending material along with the tag are removed from the article.
Ec
wikipedia-l@lists.wikimedia.org