Its a gadget on no.wp, but in alpha state and there are some debugging
Is this script available on-wiki somewhere?
[mailto:firstname.lastname@example.org] On Behalf Of John Erling
Sent: August 27, 2008 4:01 PM
To: Wikimedia Quality Discussions
Subject: [Wikiquality-l] Detector for copyright violations
There are several attempts to make bots that detect copyright
violations. The problem is that there are a lot of such "infringements"
that are legal, quotations for example, and then the writers gets pissed
because they have used the material in a completely legal way.
by placing a user in the loop. The only thing the script does is to mine
the web for possible similar texts.
Basically the script takes the additional text, extract the plain text,
excludes some of the text, breaks it into sentences, uses the sentences
to build a query, rematches the result to the sentences, accumulates
those and gives some warnings if a match limit is reached.
For the moment I try to extend the system to older edits, and also to
make it a bit more resistant to small changes in the text. It is already
fairly resistive to small reorganizations of the text.
Wikiquality-l mailing list