[Wikiquality-l] Detector for copyright violations

27 Aug 2008


      There are several attempts to make bots that detect copyright
violations. The problem is that there are a lot of such "infringements"
that are legal, quotations for example, and then the writers gets pissed
because they have used the material in a completely legal way.
I have made a Javascript-based solution that seems to solve the problem
by placing a user in the loop. The only thing the script does is to mine
the web for possible similar texts.
Basically the script takes the additional text, extract the plain text,
excludes some of the text, breaks it into sentences, uses the sentences
to build a query, rematches the result to the sentences, accumulates
those and gives some warnings if a match limit is reached.
For the moment I try to extend the system to older edits, and also to
make it a bit more resistant to small changes in the text. It is already
fairly resistive to small reorganizations of the text.
John

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Wikiquality-l] Detector for copyright violations