[Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

Sat Jun 20 23:56:34 UTC 2009

Anthony wrote:
> Wow, what's Wikipedia's policy about using a bot to scrape everything?
>   

I don't know about any policy, but I think it should still be 
discouraged.  For me this has less to do with predation on other sites 
than with our inability to keep up with the volume of data that would be 
produced.  Proofreading and wikifying are labour-intensive processes.  
It is very easy for the technically minded to bring the scan and OCR of 
a 500-page book under our roof, but without the manpower to bring the 
added value these processes are scarcely better than data dumps.

Ec
> On Sat, Jun 20, 2009 at 2:47 PM, Brian <Brian.Mingus at colorado.edu> wrote:
>   
>> That is against the law. It violates Google's ToS.
>>
>> I'm mostly complaining that Google is being Very Evil. There is nothing we
>> can do about it except complain to them. Which I don't know how to do -
>> they
>> apparently believe that the plain text versions of their books are akin to
>> their intellectual property and are unwilling to give them away.
>>
>> On Sat, Jun 20, 2009 at 12:34 PM, Falcorian wrote:
>>     
>>> So the bot just has to run at human speeds so it does not get banned, it
>>> still won't get tired or make unpredictable mistakes. And you can run it
>>> from different IPs to parallelize.
>>>
>>> --Falcorian