It would appear that we are breaking the law with respect to copyright at download.wikimedia.org.
This is a result of our policy about the insertion of copyrighted material into articles:
[[Wikipedia:Copyright problems#Instructions]] "Pages where the most recent edit is a copyright violation, but the previous article was not, should not be deleted. They should be reverted. The violating text will remain in the page history for archival reasons unless the copyright holder asks the Wikimedia Foundation to remove it."
As a result our database contains large quantities of violating material. Because this material is completely untagged (just looks like a normal revert in most cases), someone who wanted to redistribute the database without substantial liability would be unable.
It is unfortunate that we will not be able to always find and remove every copyright violation, but when we instruct our editors to sweep violations they discover under the rug, our actions could easily be construed as willful infringement.
At a minimum we should instruct editors to tag reverts with a uniform tag. In the case where the copyvio was only in the most recent version it would be fairly trivial (though computationally expensive) to sweep the DB and prune revisions that were copyvio. In cases where there were other edits after the copyvio there may be no automatic way to remove the violating text... but at least we should be tagging these changes.