[Wikipedia-l] Copyright violations at download.wikimedia.org

Gregory Maxwell gmaxwell at gmail.com
Sun May 29 22:25:59 UTC 2005


It would appear that we are breaking the law with respect to copyright
at download.wikimedia.org.

This is a result of our policy about the insertion of copyrighted
material into articles:

[[Wikipedia:Copyright problems#Instructions]]           
"Pages where the most recent edit is a copyright violation, but the
previous article was not, should not be deleted. They should be
reverted. The violating text will remain in the page history for
archival reasons unless the copyright holder asks the Wikimedia
Foundation to remove it."

As a result our database contains large quantities of violating
material. Because this material is completely untagged (just looks
like a normal revert in most cases), someone who wanted to
redistribute the database without substantial liability would be
unable.

It is unfortunate that we will not be able to always find and remove
every copyright violation, but when we instruct our editors to sweep
violations they discover under the rug, our actions could easily be
construed as willful infringement.

At a minimum we should instruct editors to tag reverts with a uniform
tag. In the case where the copyvio was only in the most recent version
it would be fairly trivial (though computationally expensive) to sweep
the DB and prune revisions that were copyvio.  In cases where there
were other edits after the copyvio there may be no automatic way to
remove the violating text... but at least we should be tagging these
changes.



More information about the Wikipedia-l mailing list