[WikiEN-l] Copyright Violation Bot

Fastfission fastfission at gmail.com
Fri Dec 22 00:17:21 UTC 2006


On 12/21/06, geni <geniice at gmail.com> wrote:
> > Trying to go through the entire database by finding random pages and
> > taking random lines seems extremely hit-and-miss to me, and if you
> > have to worry about mirrors and false positives then I can't see how
> > that would possibly be productive. The odds of finding a copyvio are
> > going to be quite low, and the amount of time needed to sort through
> > them is going to be quite high.
>
> Daniel Brandt managed it.

Did he do it by using random pages? It strikes me that it would be
something most easily done if you downloaded a copy of the database
and then ran it off of that systematically (you could filter out short
articles while you are at it).

FF



More information about the WikiEN-l mailing list