Kate Turner wrote:
Gregory Maxwell wrote in gmane.science.linguistics.wikipedia.misc:
It would appear that we are breaking the law with respect to copyright at download.wikimedia.org.
this is not particular to downloads, but applies to any place that old revisions are available, including the web interface.
An apparent infringement is not always "breaking the law."
This is a result of our policy about the insertion of copyrighted material into articles:
[[Wikipedia:Copyright problems#Instructions]] "Pages where the most recent edit is a copyright violation, but the previous article was not, should not be deleted. They should be reverted. The violating text will remain in the page history for archival reasons unless the copyright holder asks the Wikimedia Foundation to remove it."
does this policy apply everywhere or only en.wp?
It seems like a reasonable policy. This is directly concerned with a core principle so I would suggest that it should be broadly applied.
As a result our database contains large quantities of violating material.
my understand of the relevant US law is that we are not required to aggressively remove copyright violations until requested by the copyright holder.
Yes and no. Technically one is required to remove material that one knows to be an infringement when one becomes aware of it by any means. However, in the absence of a take down order there is the easy defence of not knowing that the material is an infringement. "Not knowing" is a broader concept than simply being to identify a certain passage as identical to that by someone else. If we determine that to be the case we have an infringement as a question of fact, but not necessarily as a question of law. If we maintain the offending material in the current article after it has been fully identified as an infringement the likelihood that we have also an infringement in law is greater. For purposes of this line of reason we can safely assume that a claim of fair use in the current article is not available, but the standards of fair use may be considerably relaxed.
Access to the infringing material is considerably more limited, even though in theory anyone can access it. Let's be practical! Who is going to spend a lot of his time looking through often lengthy article histories looking for copyvios to plagiarize? If a copyright holder properly makes a request to take down the material from the archive we should be prepared to comply, but until that happens we have no need to be so worried about this stuff.
I agree that copyvio material should be identified, but NOT in the edit summary. That only makes it easier for someone to go through the edit histories to find it. A two-step approach may be better. In the first edit the article is tagged as a copyvio; in a new edit the offending material along with the tag are removed from the article.
Ec