[Foundation-l] Deleting blatant copyright violations from the database

Marco Chiesa chiesa.marco at gmail.com
Mon Aug 13 09:51:29 UTC 2007

Brian wrote:

>[0] is the addition of an abstract from the journal Nature [1]. It was in
>the encyclopedia for four months until I accidentally found it. I was told
>in IRC that the procedure for this situation is to simply remove the change
>from the current revision of the article, because it is technically
>difficult to permanently remove things from the database. This seems
>incredibly problematic to me. From a legal perspective, I don't see any
>difference in viewing an old version of an article which contains a
>copyright violation, and that copyright violation still being in the current
>version. There is some effort to hide old revisions from search engines, but
>the violation still exists on the Internet, and the copyright owner's rights
>are still being violated.
I'm always surprised at the very lax attitude that en.wikipedia has 
towards copyright violations. On it.wikipedia we have a much more 
draconian approach: if a potential copyright violation is present 
(usually at least a sentence copied from another website) all the 
versions in the history containing that bit are deleted, and, if there 
are good edits in between, a note is put in the talk page with the 
deleted revisions. This is sometimes an awful work for the sysop that 
has to do it, since sometimes pages where a copyvio had been removed get 
edited with another copyvio - the risk is that previously deleted 
versions may get recovered by mistake as the only procedure to remove a 
version is deleting the page and recovering the good versions. So, for 
heavily edited pages, from time to time we move the old versions to 
another name which we protect, so that the history is not too long (we 
did it for the village pump a couple of times, than we switched to 
having a page for each thread that gets included in the weekly pump). We 
also have a bot (RevertBot) which checks all the edits with google and 
yahoo and creates a page of suspect copyvios that a sysop will have to 
check manually who copied from whom. Occasionally we discovered copyvios 
on en.wikipedia that had been there for more than one year. We also had 
a case of a trusted user with 30k edits, who had been sysop in the past, 
that was caught copying large chunks of text from printed encyclopedias. 
That forced us to set up a project to selectively remove all his 
non-typos edits as suspected copyvios, which destroyed also quite a bit 
of work by honest users, who had fixed his edits, added stuff that alone 
didn't make sense to keep but was precious anyway.


More information about the foundation-l mailing list