On 12/21/06, geni geniice@gmail.com wrote:
Trying to go through the entire database by finding random pages and taking random lines seems extremely hit-and-miss to me, and if you have to worry about mirrors and false positives then I can't see how that would possibly be productive. The odds of finding a copyvio are going to be quite low, and the amount of time needed to sort through them is going to be quite high.
Daniel Brandt managed it.
Did he do it by using random pages? It strikes me that it would be something most easily done if you downloaded a copy of the database and then ran it off of that systematically (you could filter out short articles while you are at it).
FF