On Fri, 2008-10-17 at 19:23 +0000, wikien-l-request@lists.wikimedia.org wrote:
From: Nathan nawrich@gmail.com multiple references to a website that has disappeared Question for the panel: is it better to just leave the links as is (with a note that the site does not exist anymore), remove them altogether, or replace the links with archive.org links?
I, for one, would say that yoy should just do what you would do with offline sources, there is no reason to treat online sources in a different way: when you cite a journal it is the reader's responsibility to go and find it in a library, not yours, as long as you give all necessary information to locate a copy of the journal if one exists at all, and if the journal goes out of print and all libraries of the world somehow decide to burn all copies of that journal then it is still not your responsibility, as an author or editor.
A dead link is like a book which is out of print. It is hard to find, but it was published someday, so it is appropriate to cite it as long as you include the access date (a short quotation would help too).
Your responsibility as an author is to provide proper references that would enable one to spot the source if copies exist and to provide the information in a correct manner (eg if the source says "a bit of it" don't write "lots of it"). For links, as long as you cite the pages for information that is correct and truthful and you provide proper citations (URL, access date, etc) then you have done what is expected of you. Noting that a link is dead or providing a link to a web archiver is a good thing, too.
There are some systems where you can go and keep a snapshot of a webpage for future reference. Using them is a good thing, but not necessary: when you reference a book you don't make a snapshot of it, so you shouldn't be required to take snapshots of webpages just because webpages may go dead (books can be burned or become out of print, too).
However, do note that placing citations to dead webpages, or to live webpages that soon afterwards go dead, is a way to commit undetectable vandalism. There is no easy solution against this, unless one is willing to not include any dead links.
Furthermore, the responsibilities of the author have to be balanced with the rights of the reader: the reader has a right to be able to check your work for accuracy, and citations are supposed to satisfy that right, but with the web this system appears to be broken now (with books and journals it was very unlikely for a paper source to disappear from all over the world and from all libraries at once), so one could say that dead links do not appear to be very useful for readers, particularly those not familiar with citation systems. While the author has a responsibility to provide sources and assist one in finding them by providing proper publication and access dates or other information, they are not responsible for actually keeping a copy of them or of actually finding them themselves after an article is written, but the reader has a right to be able to check the author's accuracy and therefore the volatility of the web appears to be a diservice to readers.
Perhaps the best solution would be to build a web archiving platform in Wikipedia itself, so that all referenced webpages are stored for later retrieval.
On 2008.10.18 01:29:32 +0300, nsk nsk@karastathis.org scribbled 3.3K characters: ....
Perhaps the best solution would be to build a web archiving platform in Wikipedia itself, so that all referenced webpages are stored for later retrieval.
-- Thanks, NSK Nikolaos S. Karastathis, http://nsk.karastathis.org/
I actually once wrote a bot* which processed a dump for external links and submitted them to webcitation.org. I stopped running it because the link requests didn't seem to be resulting in URLs being archived, but that was back in May. (Perhaps things have changed since then.) How much of the solution would such a bot represent? Could the solution be as cheap as a post-page-save hook which submits all http:// links in the wikitext to webcitation.org?
* https://secure.wikimedia.org/wikipedia/en/wiki/User:Gwern/Archive-bot.hs
-- gwern Reaction nitric NSDD IDB Fiel president Perl-RSA Surveillance RIT Merlin
We (the greater WP community) know people at the Internet Archive.
One could imagine a bot which submitted a list of WP reference URLs to the Archive so that they could be preferentially added to the archive library, via a process worked out with Archive people...
Alternately, another online citation archiving service could be set up as a new WMF project, specifically to support the various WMF projects.
-george william herbert george.herbert@gmail.com
On Sat, Oct 18, 2008 at 6:34 PM, Gwern Branwen gwern0@gmail.com wrote:
On 2008.10.18 01:29:32 +0300, nsk nsk@karastathis.org scribbled 3.3K characters: ....
Perhaps the best solution would be to build a web archiving platform in Wikipedia itself, so that all referenced webpages are stored for later retrieval.
-- Thanks, NSK Nikolaos S. Karastathis, http://nsk.karastathis.org/
I actually once wrote a bot* which processed a dump for external links and submitted them to webcitation.org. I stopped running it because the link requests didn't seem to be resulting in URLs being archived, but that was back in May. (Perhaps things have changed since then.) How much of the solution would such a bot represent? Could the solution be as cheap as a post-page-save hook which submits all http:// links in the wikitext to webcitation.org?
-- gwern Reaction nitric NSDD IDB Fiel president Perl-RSA Surveillance RIT Merlin
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux)
iEYEAREKAAYFAkj6jq0ACgkQvpDo5Pfl1oKgIwCeI6nz/4cKpIOv353ZmaH5NiE9 33QAn2z/JINq0DT3uyilJSSMDqVUeykw =qFLg -----END PGP SIGNATURE-----
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On 2008.10.18 19:23:11 -0700, George Herbert george.herbert@gmail.com scribbled 1.8K characters:
We (the greater WP community) know people at the Internet Archive.
Yes, I've heard this before, although for the life of me I haven't the slightest idea what good that has done us before (except I've heard they may or may not back up our dumps).
One could imagine a bot which submitted a list of WP reference URLs to the Archive so that they could be preferentially added to the archive library, via a process worked out with Archive people...
The IA already has a little public form to request URLs be archived; unfortunately, it's provided with a description which to me strongly suggests that the request is basically ignored. IMO, I'd be perfectly fine with a machine-accessible form as long as it worked. For referencing purposes, it's fine if the IA has to embargo it for 6 or 9 months or whatever - as long as it shows up eventually.
Alternately, another online citation archiving service could be set up as a new WMF project, specifically to support the various WMF projects.
-george william herbert
I don't think that's really a productive path: what's our motivation for duplicating the work of both the IA and Webcitation.org (and Lor' knows who else)? All we need are small changes from them and they'd be fine for our purposes.
-- gwern ASIO rounds PCS Underground Pox rockets 5.0i HPCC P415 NSDD
Also, if a priority system for archive.org were created, wouldn't there be a potential for spam simply to indexed with the wayback machine? I know we have nofollow on our off site links, but I cannot imagine how we would sort a problem that could be created like that.
Just my 2 cents.
- Chris
On Sun, Oct 19, 2008 at 8:44 PM, Gwern Branwen gwern0@gmail.com wrote:
On 2008.10.18 19:23:11 -0700, George Herbert george.herbert@gmail.com scribbled 1.8K characters:
We (the greater WP community) know people at the Internet Archive.
Yes, I've heard this before, although for the life of me I haven't the slightest idea what good that has done us before (except I've heard they may or may not back up our dumps).
One could imagine a bot which submitted a list of WP reference URLs to the Archive so that they could be preferentially added to the archive library, via a process worked out with Archive people...
The IA already has a little public form to request URLs be archived; unfortunately, it's provided with a description which to me strongly suggests that the request is basically ignored. IMO, I'd be perfectly fine with a machine-accessible form as long as it worked. For referencing purposes, it's fine if the IA has to embargo it for 6 or 9 months or whatever - as long as it shows up eventually.
Alternately, another online citation archiving service could be set up as a new WMF project, specifically to support the various WMF projects.
-george william herbert
I don't think that's really a productive path: what's our motivation for duplicating the work of both the IA and Webcitation.org (and Lor' knows who else)? All we need are small changes from them and they'd be fine for our purposes.
-- gwern ASIO rounds PCS Underground Pox rockets 5.0i HPCC P415 NSDD
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux)
iEYEAREKAAYFAkj7jicACgkQvpDo5Pfl1oLFLgCePeHIZwqlWlMsO/chtH7Kvcvz 7wAAoJBONdl2iLCjdK2hfKM6h4QvvPuj =WqRp -----END PGP SIGNATURE-----
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On Sat, Oct 18, 2008 at 01:29:32AM +0300, nsk wrote:
A dead link is like a book which is out of print. It is hard to find, but it was published someday, so it is appropriate to cite it as long as you include the access date (a short quotation would help too).
A book that is out of print but available in numerous libraries is treated differently, of course, than a book of which there are no remaining copies in existence.
Web links should be treated the same. Web content that has gone dead but has been archived elsewhere is not problematic. Web content that has been lost to history is no longer valid as a reference on its own.
- Carl