Wow, what's Wikipedia's policy about using a bot to scrape everything?
On Sat, Jun 20, 2009 at 2:47 PM, Brian Brian.Mingus@colorado.edu wrote:
That is against the law. It violates Google's ToS.
I'm mostly complaining that Google is being Very Evil. There is nothing we can do about it except complain to them. Which I don't know how to do - they apparently believe that the plain text versions of their books are akin to their intellectual property and are unwilling to give them away.
On Sat, Jun 20, 2009 at 12:34 PM, Falcorian < alex.public.account+WikimediaMailingList@gmail.comalex.public.account%2BWikimediaMailingList@gmail.com <alex.public.account%2BWikimediaMailingList@gmail.comalex.public.account%252BWikimediaMailingList@gmail.com
wrote:
So the bot just has to run at human speeds so it does not get banned, it still won't get tired or make unpredictable mistakes. And you can run it from different IPs to parallelize.
--Falcorian
On Sat, Jun 20, 2009 at 11:04 AM, Brian Brian.Mingus@colorado.edu
wrote:
Not likely. I've been banned from Google's regular search at least a
dozen
times during semi-frenetic search sprees in which I was identified as a bot. There is no doubt that if you try to automate it you will be quickly
shot
down.
On Sat, Jun 20, 2009 at 12:02 PM, Platonides Platonides@gmail.com
wrote:
Brian wrote:
Unfortunately the only way I've found to download the full text of
a
public
domain book from Google is to flip through the book a page at a
time,
copying the text to your clipboard. There are roughly 2-3 million public domain books in Google Books.
That's easy to fix :)
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l