[WikiEN-l] Ephemeral nature of web pages - a silly idea?

Wed Jan 3 18:36:56 UTC 2007

On Wed, 03 Jan 2007 18:08:22 +0100, Andrew Gray <shimgray at gmail.com> wrote:
<snip>
> archive.org's six-month delay is intentional, but I suppose it could
> be possible for them to display some form of "we have this site
> archived on X date and just not displayed yet" identifier to the
> date-selection page; this would obviate the "not known" problem whilst
> meaning they don't have to publish it. hmm. If anyone wants to propose
> it to them, free free.

Actualy I don't think they physicaly get the data untill 6 months have  
passed. The founder of the archive explained this in a interview with Nerd  
TV[1]. Basicaly back in the day they started Alexa and the Internet  
Archive at the same time. Alexa was for profit and the archive is  
non-profit, and they had a contract between the two that once Alexa was  
"done" with the data they collected (6 months delay to take the  
"commercial edge" off it) it was handed over to the Internet Archive.  
Alexa have since changed ownership, but the contract to supply data to the  
archive remains in effect, so untill 6 months have passed it's actualy  
Alexa, not the Internet Archive that have that data as I understand it.

1: <http://www.pbs.org/cringely/nerdtv/transcripts/004.html>

P.S. To increase the odds of a particular page getting archived (with the  
caveant that some sites may include no-archive directives in theyr  
robots.txt or meta HTML haders, such sites are not archived) visit it with  
a browser that have the Alexa toolbar installed (yuck) or use Internet  
Explorer (yuck) and choose Tools -> What's related (or some such), wich  
will also cause Alexa to crawl the page as I understand it (they supply  
that feature, dunno if it's in IE 7), or just put the url into the form at  
<http://www.alexa.com/site/help/webmasters/#crawl_site>

-- 
[[:en:User:Sherool]]