[Wikipedia-l] Google-test bot
Daniel Mayer
maveric149 at yahoo.com
Sun Aug 11 11:43:30 UTC 2002
I just ran into a fairly significant copyright infringer that did a wholesale
copy of text on Joseph Wilson Swan and Internet Chess Club and a somewhat
complex job involving several websites with the Light bulb article (see
http://www.wikipedia.com/wiki/Talk:Light_bulb). The IP of the contributor who
did this was blocked as a result (pending his/her asking the list for a block
removal).
The internet chess article is now fine because several users have since
edited the text extensively (removing "advertising" like qualities - they
were not aware of the copyright violation). The light bulb article is similar
but there still are some entire sentences that are word for word the same
(not to mention the obvious parentage of many other sentences).
My question is this; would it be possible for a bot to be programmed to
search Google for longish strings of text that are inserted into wikipedia
and then log the results when strings are matched? This bot could crawl
through new diffs in Recent Changes and log possible violations that would
then have to be reviewed by a human to see if there is in fact any violation.
This of course will pick up a lot of public domain text but if we also
encouraged users to 'cite their sources' these legit uses of text can be
quickly skipped over in the log. Common phrases would also be logged if the
bot wasn't programmed to minimize this by maybe centering a string search on
a period (thus spanning parts of two sentences).
The above copyright violations happened several days ago and were obviously
missed by the militia. Subsequently several users edited and expanded the
offending text which causes major version control headaches when there still
are obvious violations mixed with legit contribs.
Having a bot do much of the work would free up human resources (this also
would reduce duplication of effort -- sometimes several people will perform a
Google test on a suspicious entry while at other times nobody does a test).
Yes I know this is an example of a bot that could be a good community member.
I take back my previous comment (it all depends on what the bot does).
-- Daniel Mayer (aka mav)
More information about the Wikipedia-l
mailing list