[WikiEN-l] Copyright Violation Bot
geni
geniice at gmail.com
Thu Dec 21 16:51:29 UTC 2006
On 12/21/06, Fastfission <fastfission at gmail.com> wrote:
> On 11/24/06, Earle Martin <wikipedia at downlode.org> wrote:
> > Whether the copyvio is an inward or outward bound one in each case is
> > sadly beyond the scope of my programming skills, so I leave that to
> > you.
>
> I don't think this is a programming program -- its a conceptual problem.
>
> A good copyvio bot -- one which doesn't waste one's time with false
> positives or outward copyvios -- would be one which monitors NEW
> additions and did not try to parse previously existing material. If
> someone says, "This is new, original text" but it gets Google hits, it
> is almost certainly copy-and-pasted (whether that makes it officially
> a copyvio still needs to be decided, but it is a vastly simpler
> problem than the previous one).
>
This is already being done
> Trying to go through the entire database by finding random pages and
> taking random lines seems extremely hit-and-miss to me, and if you
> have to worry about mirrors and false positives then I can't see how
> that would possibly be productive. The odds of finding a copyvio are
> going to be quite low, and the amount of time needed to sort through
> them is going to be quite high.
Daniel Brandt managed it.
--
geni
More information about the WikiEN-l
mailing list