On 12/21/06, geni <geniice(a)gmail.com> wrote:
Trying to go
through the entire database by finding random pages and
taking random lines seems extremely hit-and-miss to me, and if you
have to worry about mirrors and false positives then I can't see how
that would possibly be productive. The odds of finding a copyvio are
going to be quite low, and the amount of time needed to sort through
them is going to be quite high.
Daniel Brandt managed it.
Did he do it by using random pages? It strikes me that it would be
something most easily done if you downloaded a copy of the database
and then ran it off of that systematically (you could filter out short
articles while you are at it).
FF