copyvio check - Toolserver-l

22 Oct 2005

Hi all,

Today, I accidentally discovered a massive amount of copyvios from 
Microsoft Encarta in the german wikipedia by a single user (who 
unfortunately used an excessive amount of sockpuppet accounts). For 
anyone interested, the list of detected copyvios is here:
* http://de.wikipedia.org/wiki/Benutzer:Peterlustig/Encarta_URVs

What we did was: go through all the article contributions by this user 
manually, pick the bigger text insertions and then look up the entries 
in encarta and other sources. We discovered a lot of copyvios like this, 
but since I don't even know all the sockpuppets this user had, we have 
no idea how many more there are still in wikipedia.

I wonder if we could use the toolserver for a good copyvio check system. 
Wikimedia germany could easily sponsor Britannica, Encarta and Brockhaus 
DVDs which serve as a text base for comparison (if we manage to access 
the texts somehow).

It would be great if someone or more people together would like to work 
on this - apart from vandal fighting tools this should have top priority.

greetings,
elian