On Fri, Dec 10, 2010 at 9:54 PM, WJhonson@aol.com wrote:
In a message dated 12/10/2010 12:48:31 PM Pacific Standard Time, jamesmikedupont@googlemail.com writes:
I am not talking about books, just webpages.
lets take ladygaga.com as example
Wayback engine : http://web.archive.org/web/*/http://www.ladygaga.com
Google cache: http://webcache.googleusercontent.com/search?q=cache:1720lEPHkysJ:www.ladyga...
here are two copies of copyrighted materials, we should make sure that our referenced webpages are in archive.org or mirrored on some server. Ideally we would have our own search engine and cache.
mike
I have no problem with the idea of pointing refs to a page on archive.org, however you must understand that even previously archived pages *may* be removed from archive.org at the owner's request or even at the request of a .robots entry.
The only advantage I see over using archive.org instead of a plain link, is the ability to see what a page *looked* like in the past. I'm not sure that's a great advantage. Why do you think it is? If a page comes down, should we not err on the part of assuming the owner no longer wants it public and if the owner doesnt want it public, are we to make sure it stays public by caching it against their will?
Both Google and Archive.org (much to my utter dismay) obey certain rules set up by web page owners to not index certain pages, or to remove them from caching history entirely (even old copies). Are you suggesting we disregard those rules? If not, then I see no advantage in our caching pages which are available in caches already.
My point is we should index them ourselves. We should have the pages used as references first listed in an easy to use manner and if possible we should cache them. If they are not cacheable because of some restrictions, the references should be marked somehow as not as good and people might find better references. In the end, like citeseer you will find that pages that are available and open and cachable will be cited and used more than pages that are not.
Right now, I dont know of a simple way to even get this list of references from wp. There is alot of work to do, and if we do this, it will benefit the wikipedia. Another thing to do is to translate the pages referenced.
mike