Ephemeral nature of web pages - a silly idea? - WikiEN-l

List overview All Threads
Download

newer

Ephemeral nature of web pages - a silly idea?

older

Re: [WikiEN-l] Wikipedia jumps...

The present tense

zero 0000

2 Jan 2007 2 Jan '07

2:33 p.m.

We allow web pages as sources under some reliability criteria, even though the web page might change or go away. We choose to live with this problem as a lesser evil than banning web sources altogether. But, I was wondering: could Wikipedia not keep its own archive of web pages used as sources? Or some such web pages? This is a minimally thought out proposal as you can tell!

Zero.

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Show replies by date

Jake Waskett

2 Jan 2 Jan

2:24 p.m.

There might be some copyright problems involved with that proposal.

There is a partial solution, however. The Wayback Machine [http://www.archive.org/index.php] archives many web pages and their revisions. They have negotiates an exemption to the DMCA [http://www.archive.org/iathreads/post-view.php?id=82097], and so will probably be around for some time to come.

Jake

On Tue, 2007-01-02 at 05:33 -0800, zero 0000 wrote:

...

We allow web pages as sources under some reliability criteria, even though the web page might change or go away. We choose to live with this problem as a lesser evil than banning web sources altogether. But, I was wondering: could Wikipedia not keep its own archive of web pages used as sources? Or some such web pages? This is a minimally thought out proposal as you can tell!

Zero.

MacGyverMagic/Mgm

8:01 p.m.

Problem is their lag time. It takes over 6 months for stuff to appear there by which time you are in deep trouble if it wasn't stored to begin with. It would be nice if every site that's referenced in Wikipedia is spidered automatically just like every site that's visited by a surfer with an Alexa toolbar.

Mgm

On 1/2/07, Jake Waskett jake@waskett.org wrote:

...

There might be some copyright problems involved with that proposal.

There is a partial solution, however. The Wayback Machine [http://www.archive.org/index.php] archives many web pages and their revisions. They have negotiates an exemption to the DMCA [http://www.archive.org/iathreads/post-view.php?id=82097], and so will probably be around for some time to come.

Jake

On Tue, 2007-01-02 at 05:33 -0800, zero 0000 wrote:

...
We allow web pages as sources under some reliability criteria, even though the web page might change or go away. We choose to live with this problem as a lesser evil than banning web sources altogether. But, I was wondering: could Wikipedia not keep its own archive of web pages used as sources? Or some such web pages? This is a minimally thought out proposal as you can tell!

Zero.

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

James Hare

10:09 p.m.

I have an idea.

How about, from now on, every time any of us uses a web source, we save the page so that it'll be around in the future? Not necessarily making this a policy, or a guideline, but a Really Good Idea.

Once it goes off the Internet, we will still have it for at least our own use.

Or how about a Library of Congress-esque things for websites, where identified websites are selectively archived so they can exist for eternity (as opposed to Wayback Machine which seems to work as randomly picking pages).

On 1/2/07, MacGyverMagic/Mgm macgyvermagic@gmail.com wrote:

...

Problem is their lag time. It takes over 6 months for stuff to appear there by which time you are in deep trouble if it wasn't stored to begin with. It would be nice if every site that's referenced in Wikipedia is spidered automatically just like every site that's visited by a surfer with an Alexa toolbar.

Mgm

On 1/2/07, Jake Waskett jake@waskett.org wrote:

...
There might be some copyright problems involved with that proposal.

There is a partial solution, however. The Wayback Machine [http://www.archive.org/index.php] archives many web pages and their revisions. They have negotiates an exemption to the DMCA [http://www.archive.org/iathreads/post-view.php?id=82097], and so will probably be around for some time to come.

Jake

On Tue, 2007-01-02 at 05:33 -0800, zero 0000 wrote:

...
We allow web pages as sources under some reliability criteria, even though the web page might change or go away. We choose to live with this problem as a lesser evil than banning web sources altogether. But, I was wondering: could Wikipedia not keep its own archive of web pages used as sources? Or some such web pages? This is a minimally thought out proposal as you can tell!

Zero.

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

Jake Waskett

3 Jan 3 Jan

12:53 p.m.

A splendid idea for Internet-based research in general, and it's surprising how few people do this (myself included, I admit).

In terms of Wikipedia sourcing, however, it may create (or overlook) a problem: verifiability. What is the purpose, for Wikipedia, of doing so?

Is the purpose to be able to say, at some later date, "oh, that page must've been deleted. Want me to email you a copy of the original?" If so, is that verifiable? I don't think so - it may be impossible to contact you, for example. So how do Wikipedia readers benefit?

On the other hand, if the page concerned contained quotations or cited other sources that may help in locating a *different* source, then it is certainly helpful, albeit indirectly.

My concern is that encouraging saving of pages - however unofficially - may be seen to imply the former, perhaps undermining vital policies. But this may be avoidable.

Jake

On Tue, 2007-01-02 at 16:09 -0500, James Hare wrote:

...

I have an idea.

How about, from now on, every time any of us uses a web source, we save the page so that it'll be around in the future? Not necessarily making this a policy, or a guideline, but a Really Good Idea.

Once it goes off the Internet, we will still have it for at least our own use.

Or how about a Library of Congress-esque things for websites, where identified websites are selectively archived so they can exist for eternity (as opposed to Wayback Machine which seems to work as randomly picking pages).

On 1/2/07, MacGyverMagic/Mgm macgyvermagic@gmail.com wrote:

...
Problem is their lag time. It takes over 6 months for stuff to appear there by which time you are in deep trouble if it wasn't stored to begin with. It would be nice if every site that's referenced in Wikipedia is spidered automatically just like every site that's visited by a surfer with an Alexa toolbar.

Mgm

On 1/2/07, Jake Waskett jake@waskett.org wrote:

...
There might be some copyright problems involved with that proposal.

There is a partial solution, however. The Wayback Machine [http://www.archive.org/index.php] archives many web pages and their revisions. They have negotiates an exemption to the DMCA [http://www.archive.org/iathreads/post-view.php?id=82097], and so will probably be around for some time to come.

Jake

On Tue, 2007-01-02 at 05:33 -0800, zero 0000 wrote:

...
We allow web pages as sources under some reliability criteria, even though the web page might change or go away. We choose to live with this problem as a lesser evil than banning web sources altogether. But, I was wondering: could Wikipedia not keep its own archive of web pages used as sources? Or some such web pages? This is a minimally thought out proposal as you can tell!

Zero.

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

Andrew Gray

6:06 p.m.

On 02/01/07, James Hare messedrocker@gmail.com wrote:

...

Or how about a Library of Congress-esque things for websites, where identified websites are selectively archived so they can exist for eternity (as opposed to Wayback Machine which seems to work as randomly picking pages).

Already in progress, in various forms... several national libraries are slowly kicking "online legal deposit" (or something similar) into place, and many countries have fiddled the copyright laws to permit it.

(Denmark is the only one I know of fully up and running, but the system is getting there)

-- - Andrew Gray andrew.gray@dunelm.org.uk

Ilmari Karonen

6 Jan 6 Jan

10:06 p.m.

James Hare wrote:

...

Or how about a Library of Congress-esque things for websites, where identified websites are selectively archived so they can exist for eternity (as opposed to Wayback Machine which seems to work as randomly picking pages).

Someone else on this list recently pointed me to www.webcitation.org, which seems to exist for exactly this purpose. I've been gradually going through articles I've edited in the past and submitting any web pages I've cited for archiving, and I encourage others to do the same. Their bookmarklet feature is particularly convenient for this.

-- Ilmari Karonen

Jake Waskett

3 Jan 3 Jan

12:39 p.m.

Sorry, I should have been more clear. My thought was that, when linking to a page, it might be worth checking first whether archive.org has a copy and, if so, linking to that version.

Your second comment is interesting. An argument could be made that Wikipedia represents a selection of the more useful pages on the Internet, and I wonder whether the folks at archive.org have considered this at all. It is worth contacting them?

Jake

On Tue, 2007-01-02 at 20:01 +0100, MacGyverMagic/Mgm wrote:

...

Problem is their lag time. It takes over 6 months for stuff to appear there by which time you are in deep trouble if it wasn't stored to begin with. It would be nice if every site that's referenced in Wikipedia is spidered automatically just like every site that's visited by a surfer with an Alexa toolbar.

Mgm

On 1/2/07, Jake Waskett jake@waskett.org wrote:

...
There might be some copyright problems involved with that proposal.

There is a partial solution, however. The Wayback Machine [http://www.archive.org/index.php] archives many web pages and their revisions. They have negotiates an exemption to the DMCA [http://www.archive.org/iathreads/post-view.php?id=82097], and so will probably be around for some time to come.

Jake

On Tue, 2007-01-02 at 05:33 -0800, zero 0000 wrote:

...
We allow web pages as sources under some reliability criteria, even though the web page might change or go away. We choose to live with this problem as a lesser evil than banning web sources altogether. But, I was wondering: could Wikipedia not keep its own archive of web pages used as sources? Or some such web pages? This is a minimally thought out proposal as you can tell!

Zero.

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

Andrew Gray

6:08 p.m.

On 02/01/07, MacGyverMagic/Mgm macgyvermagic@gmail.com wrote:

...

Problem is their lag time. It takes over 6 months for stuff to appear there by which time you are in deep trouble if it wasn't stored to begin with. It would be nice if every site that's referenced in Wikipedia is spidered automatically just like every site that's visited by a surfer with an Alexa toolbar.

We could, of course, ask them to add "any WP extlink" to their spidering routines...

archive.org's six-month delay is intentional, but I suppose it could be possible for them to display some form of "we have this site archived on X date and just not displayed yet" identifier to the date-selection page; this would obviate the "not known" problem whilst meaning they don't have to publish it. hmm. If anyone wants to propose it to them, free free.

-- - Andrew Gray andrew.gray@dunelm.org.uk

Sherool

7:36 p.m.

On Wed, 03 Jan 2007 18:08:22 +0100, Andrew Gray shimgray@gmail.com wrote: <snip>

...

archive.org's six-month delay is intentional, but I suppose it could be possible for them to display some form of "we have this site archived on X date and just not displayed yet" identifier to the date-selection page; this would obviate the "not known" problem whilst meaning they don't have to publish it. hmm. If anyone wants to propose it to them, free free.

Actualy I don't think they physicaly get the data untill 6 months have passed. The founder of the archive explained this in a interview with Nerd TV[1]. Basicaly back in the day they started Alexa and the Internet Archive at the same time. Alexa was for profit and the archive is non-profit, and they had a contract between the two that once Alexa was "done" with the data they collected (6 months delay to take the "commercial edge" off it) it was handed over to the Internet Archive. Alexa have since changed ownership, but the contract to supply data to the archive remains in effect, so untill 6 months have passed it's actualy Alexa, not the Internet Archive that have that data as I understand it.

1: http://www.pbs.org/cringely/nerdtv/transcripts/004.html

P.S. To increase the odds of a particular page getting archived (with the caveant that some sites may include no-archive directives in theyr robots.txt or meta HTML haders, such sites are not archived) visit it with a browser that have the Alexa toolbar installed (yuck) or use Internet Explorer (yuck) and choose Tools -> What's related (or some such), wich will also cause Alexa to crawl the page as I understand it (they supply that feature, dunno if it's in IE 7), or just put the url into the form at http://www.alexa.com/site/help/webmasters/#crawl_site

-- [[:en:User:Sherool]]

MacGyverMagic/Mgm

6 Jan 6 Jan

11:12 p.m.

On 1/3/07, Andrew Gray shimgray@gmail.com wrote:

...

On 02/01/07, MacGyverMagic/Mgm macgyvermagic@gmail.com wrote:

...
Problem is their lag time. It takes over 6 months for stuff to appear

there

...
by which time you are in deep trouble if it wasn't stored to begin with.

It

...
would be nice if every site that's referenced in Wikipedia is spidered automatically just like every site that's visited by a surfer with an

Alexa

...
toolbar.

We could, of course, ask them to add "any WP extlink" to their spidering routines...

archive.org's six-month delay is intentional, but I suppose it could be possible for them to display some form of "we have this site archived on X date and just not displayed yet" identifier to the date-selection page; this would obviate the "not known" problem whilst meaning they don't have to publish it. hmm. If anyone wants to propose it to them, free free.

--

Andrew Gray

andrew.gray@dunelm.org.uk _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l

Yes, that's what I thought. Worth asking them. It's a pain to have pages disappear and lose verifiable content as a result.

Mgm

6569

Age (days ago)

6573

Last active (days ago)

wikien-l@lists.wikimedia.org

10 comments

7 participants

tags (0)

participants (7)

Andrew Gray
Ilmari Karonen
Jake Waskett
James Hare
MacGyverMagic/Mgm
Sherool
zero 0000