[WikiEN-l] Copyright Violation Bot

Earle Martin wikipedia at downlode.org
Fri Dec 22 11:32:40 UTC 2006


On 21/12/06, Fastfission <fastfission at gmail.com> wrote:
> The odds of finding a copyvio are going to be quite low.

Did you try actually running the program? I guess not.

Let's have another demonstration. I just ran the program a few times.
On the fifth try it got me this:

    http://en.wikipedia.org/wiki/Hilal-i-Jurat

which is copied wholesale from this copyrighted text:

    http://faculty.winthrop.edu/haynese/medals/Pakistan/pakistan.htm

On the very next try, I got:

    http://en.wikipedia.org/wiki/A._J._Seymour

which was either copied from, or to (either way, without attribution)
this website:

    http://www.triste-le-roi.blogspot.com/ajs_main.html

The revision history indicates the person responsible was the
maintainer of the website. Six or seven more program runs later, I
got:

    http://en.wikipedia.org/wiki/Abd%C3%BClhak_H%C3%A2mid

which contains material copied from:

    http://www.osmanli700.gen.tr/english/individuals/a19.html

My conclusion? First, the basic idea of the program is a sound one.
Second, there are thousands more copyvios on the English Wikipedia
than most people would ever imagine.

By the way, my program has a second use - you can also use it on a
specific page if you're suspicious that it contains copyrighted text.


-- 
Earle Martin
            http://downlode.org/
http://purl.org/net/earlemartin/



More information about the WikiEN-l mailing list