[Foundation-l] excluding Wikipedia clones from searching

Fri Dec 10 20:48:06 UTC 2010

I am not talking about books, just webpages.

lets take ladygaga.com as example

Wayback engine :
http://web.archive.org/web/*/http://www.ladygaga.com

Google cache:
http://webcache.googleusercontent.com/search?q=cache:1720lEPHkysJ:www.ladygaga.com/+lady+gaga&cd=1&hl=de&ct=clnk&gl=de&client=firefox-a

here are two copies of copyrighted materials, we should make sure that
our referenced webpages are in archive.org or mirrored on some server.
Ideally we would have our own search engine and cache.

mike

On Fri, Dec 10, 2010 at 9:00 PM,  <WJhonson at aol.com> wrote:
> In a message dated 12/10/2010 11:55:21 AM Pacific Standard Time,
> jamesmikedupont at googlemail.com writes:
>
>
> i mean google has copies, caches of items for searching.
> How can google cache this?
> Archive.org has copyrighted materials as well.
> We should be able to save backups of this material as well.
> mike
>
>
>
> Mike I believe your statement lacks evidence.
> I don't think either of these has available full copies of anything under
> copyright.
> If you can give an example, please do so, so I can look at your specific
> example.
>
> Google Books has copies, not Google.  The full readable copies are all under
> public domain.
> The snippet views are not.  The preview views mean that they actually
> received *permission* from the copyright holder to do a preview view.
>
> That's why it's very rare to find a preview view for any book that predates
> the internet!  You either get snippet or full.
> Probably the author is actually dead, and they can't find who holds the
> copyright easily today.  Or it's too much trouble for a book that fifteen
> people look at.
>
> W

-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova and Albania
flossk.org flossal.org