[Wikipedia-l] Google-test bot

Daniel Mayer maveric149 at yahoo.com
Sun Aug 11 11:43:30 UTC 2002


I just ran into a fairly significant copyright infringer that did a wholesale 
copy of text on Joseph Wilson Swan and Internet Chess Club and a somewhat 
complex job involving several websites with the Light bulb article (see 
http://www.wikipedia.com/wiki/Talk:Light_bulb). The IP of the contributor who 
did this was blocked as a result (pending his/her asking the list for a block 
removal). 

The internet chess article is now fine because several users have since 
edited the text extensively (removing "advertising" like qualities - they 
were not aware of the copyright violation). The light bulb article is similar 
but there still are some entire sentences that are word for word the same 
(not to mention the obvious parentage of many other sentences).

My question is this; would it be possible for a bot to be programmed to 
search Google for longish strings of text that are inserted into wikipedia 
and then log the results when strings are matched? This bot could crawl 
through new diffs in Recent Changes and log possible violations that would 
then have to be reviewed by a human to see if there is in fact any violation. 

This of course will pick up a lot of public domain text but if we also 
encouraged users to 'cite their sources' these legit uses of text can be 
quickly skipped over in the log. Common phrases would also be logged if the 
bot wasn't programmed to minimize this by maybe centering a string search on 
a period (thus spanning parts of two sentences). 

The above copyright violations happened several days ago and were obviously 
missed by the militia. Subsequently several users edited and expanded the 
offending text which causes major version control headaches when there still 
are obvious violations mixed with legit contribs.

Having a bot do much of the work would free up human resources (this also 
would reduce duplication of effort -- sometimes several people will perform a 
Google test on a suspicious entry while at other times nobody does a test).

Yes I know this is an example of a bot that could be a good community member. 
I take back my previous comment (it all depends on what the bot does). 

-- Daniel Mayer (aka mav) 



More information about the Wikipedia-l mailing list