The website findarticles died in 2012 causing over 20 000 articles to have dead links on them. A few of them was backed up on Wayback, but their robot.txt changed so all those archives were deleted as well. So either articles have a dead link showing as 200 (which findlinks.com does) or they are claiming to be archived while they are not. Read more in my blog post about this: https://jonatanglad.wordpress.com/2015/06/29/findarticles-com/ Can we use a bot to remove all instances of this link, or should we go through them all manually? Can we use bots such as CItation bot (which is currently blocked) to find doi's and other links to replace these links with? Ideas people! Barely any of these links are tagged as dead, and can't by Checklinks (unless done manually) since they show as 200. /Josve05a
Jonatan Svensson Glad
President of SSU Tyresö and Editor on Wikipedia
<redacted phone number> | gladjonatan@outlook.com
All views and opinions expressed in this email message are the personal opinions of the author and do not represent those of any organization which might be related to this message. No liability can be held for any damages, however caused, to any recipients of this message.
The average lifespan of a webpage is about 77 days. It matters not whether the site is still running or dead. Webmasters shuffle stuff about and delete things at will. Click on the random article button and see a) how many of the first 10 have external links, and b) how many of those links are still live, or don't redirect to the sites homepage. I reckon at least 50% of all external links on en.wp are dead. Lesson: the internet is ephemeral and the only permanent record is on physical material.
On 30/06/2015 05:36, Jonatan Svensson Glad wrote:
The website findarticles died in 2012 causing over 20 000 articles to have dead links on them. A few of them was backed up on Wayback, but their robot.txt changed so all those archives were deleted as well. So either articles have a dead link showing as 200 (which findlinks.com does) or they are claiming to be archived while they are not. Read more in my blog post about this: https://jonatanglad.wordpress.com/2015/06/29/findarticles-com/ Can we use a bot to remove all instances of this link, or should we go through them all manually? Can we use bots such as CItation bot (which is currently blocked) to find doi's and other links to replace these links with? Ideas people! Barely any of these links are tagged as dead, and can't by Checklinks (unless done manually) since they show as 200. /Josve05a
Il 30/06/2015 11:41, Lilburne ha scritto:
The average lifespan of a webpage is about 77 days. It matters not whether the site is still running or dead. Webmasters shuffle stuff about and delete things at will. Click on the random article button and see a) how many of the first 10 have external links, and b) how many of those links are still live, or don't redirect to the sites homepage. I reckon at least 50% of all external links on en.wp are dead. Lesson: the internet is ephemeral and the only permanent record is on physical material.
Yes, if you forget https://en.wikipedia.org/wiki/Destruction_of_the_Library_of_Alexandria.
On 30/06/2015 05:36, Jonatan Svensson Glad wrote:
The website findarticles died in 2012 causing over 20 000 articles to have dead links on them. A few of them was backed up on Wayback, but their robot.txt changed so all those archives were deleted as well. So either articles have a dead link showing as 200 (which findlinks.com does) or they are claiming to be archived while they are not. Read more in my blog post about this: https://jonatanglad.wordpress.com/2015/06/29/findarticles-com/ Can we use a bot to remove all instances of this link, or should we go through them all manually? Can we use bots such as CItation bot (which is currently blocked) to find doi's and other links to replace these links with? Ideas people! Barely any of these links are tagged as dead, and can't by Checklinks (unless done manually) since they show as 200. /Josve05a
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On 30/06/2015 10:58, Ricordisamoa wrote:
Il 30/06/2015 11:41, Lilburne ha scritto:
The average lifespan of a webpage is about 77 days. It matters not whether the site is still running or dead. Webmasters shuffle stuff about and delete things at will. Click on the random article button and see a) how many of the first 10 have external links, and b) how many of those links are still live, or don't redirect to the sites homepage. I reckon at least 50% of all external links on en.wp are dead. Lesson: the internet is ephemeral and the only permanent record is on physical material.
Yes, if you forget https://en.wikipedia.org/wiki/Destruction_of_the_Library_of_Alexandria.
Well of course that was some 1700 years ago. You are equating a event of millennial proportion with something that happens every day? Get a grip on reality.
There's a point to be made there: Libraries in some countries are still being destroyed (see http://www.theguardian.com/world/2013/jan/28/mali-timbuktu-library-ancient-m... and http://elaph.com/Web/Culture/2015/2/985403.html or https://finance.yahoo.com/news/isis-burns-8000-rare-books-030900856.html), and although there's an effort to save them, it's not an effort we're really involved with. That's before we even start on how to reflect a reference to a non-existent book on Wikipedia!
But to get back to the original point, a semi-automated effort might be the best way (if the slowest) to get these web pages linked properly again.
Richard Symonds Wikimedia UK 0207 065 0992
Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).
*Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.*
On 30 June 2015 at 11:06, Lilburne lilburne@tygers-of-wrath.net wrote:
On 30/06/2015 10:58, Ricordisamoa wrote:
Il 30/06/2015 11:41, Lilburne ha scritto:
The average lifespan of a webpage is about 77 days. It matters not whether the site is still running or dead. Webmasters shuffle stuff about and delete things at will. Click on the random article button and see a) how many of the first 10 have external links, and b) how many of those links are still live, or don't redirect to the sites homepage. I reckon at least 50% of all external links on en.wp are dead. Lesson: the internet is ephemeral and the only permanent record is on physical material.
Yes, if you forget https://en.wikipedia.org/wiki/Destruction_of_the_Library_of_Alexandria.
Well of course that was some 1700 years ago. You are equating a event of millennial proportion with something that happens every day? Get a grip on reality.
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Try to find links from https://archive.org/web/ , http://webcitation.org , or https://archive.is (note: this last one is blacklisted on enwp, iirc). I revived few deadlinks (not this host though) from these archives.
And try to make habit of archiving websites when you cite something. I think I once saw a project to automatically archive citation source links but not sure about current status.
-- Revi https://revi.me -- Sent from Android -- 2015. 6. 30. 오후 8:40에 "Richard Symonds" richard.symonds@wikimedia.org.uk님이 작성:
There's a point to be made there: Libraries in some countries are still being destroyed (see
http://www.theguardian.com/world/2013/jan/28/mali-timbuktu-library-ancient-m... and http://elaph.com/Web/Culture/2015/2/985403.html or https://finance.yahoo.com/news/isis-burns-8000-rare-books-030900856.html), and although there's an effort to save them, it's not an effort we're really involved with. That's before we even start on how to reflect a reference to a non-existent book on Wikipedia!
But to get back to the original point, a semi-automated effort might be the best way (if the slowest) to get these web pages linked properly again.
Richard Symonds Wikimedia UK 0207 065 0992
Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).
*Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.*
On 30 June 2015 at 11:06, Lilburne lilburne@tygers-of-wrath.net wrote:
On 30/06/2015 10:58, Ricordisamoa wrote:
Il 30/06/2015 11:41, Lilburne ha scritto:
The average lifespan of a webpage is about 77 days. It matters not whether the site is still running or dead. Webmasters shuffle stuff
about
and delete things at will. Click on the random article button and see
a)
how many of the first 10 have external links, and b) how many of those links are still live, or don't redirect to the sites homepage. I
reckon at
least 50% of all external links on en.wp are dead. Lesson: the
internet is
ephemeral and the only permanent record is on physical material.
Yes, if you forget https://en.wikipedia.org/wiki/Destruction_of_the_Library_of_Alexandria.
Well of course that was some 1700 years ago. You are equating a event of millennial proportion with something that happens every day? Get a grip
on
reality.
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
How do you archive a website? I didn’t even know it was possible. Peter
-----Original Message----- From: wikimedia-l-bounces@lists.wikimedia.org [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Hong, Yongmin Sent: 30 June 2015 05:01 PM To: Wikimedia Mailing List Subject: Re: [Wikimedia-l] FindArticles.com died in 2012
Try to find links from https://archive.org/web/ , http://webcitation.org , or https://archive.is (note: this last one is blacklisted on enwp, iirc). I revived few deadlinks (not this host though) from these archives.
And try to make habit of archiving websites when you cite something. I think I once saw a project to automatically archive citation source links but not sure about current status.
-- Revi https://revi.me -- Sent from Android -- 2015. 6. 30. 오후 8:40에 "Richard Symonds" richard.symonds@wikimedia.org.uk님이 작성:
There's a point to be made there: Libraries in some countries are still being destroyed (see
http://www.theguardian.com/world/2013/jan/28/mali-timbuktu-library-anc ient-manuscripts and http://elaph.com/Web/Culture/2015/2/985403.html or https://finance.yahoo.com/news/isis-burns-8000-rare-books-030900856.ht ml), and although there's an effort to save them, it's not an effort we're really involved with. That's before we even start on how to reflect a reference to a non-existent book on Wikipedia!
But to get back to the original point, a semi-automated effort might be the best way (if the slowest) to get these web pages linked properly again.
Richard Symonds Wikimedia UK 0207 065 0992
Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).
*Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.*
On 30 June 2015 at 11:06, Lilburne lilburne@tygers-of-wrath.net wrote:
On 30/06/2015 10:58, Ricordisamoa wrote:
Il 30/06/2015 11:41, Lilburne ha scritto:
The average lifespan of a webpage is about 77 days. It matters not whether the site is still running or dead. Webmasters shuffle stuff
about
and delete things at will. Click on the random article button and see
a)
how many of the first 10 have external links, and b) how many of those links are still live, or don't redirect to the sites homepage. I
reckon at
least 50% of all external links on en.wp are dead. Lesson: the
internet is
ephemeral and the only permanent record is on physical material.
Yes, if you forget https://en.wikipedia.org/wiki/Destruction_of_the_Library_of_Alexandria.
Well of course that was some 1700 years ago. You are equating a event of millennial proportion with something that happens every day? Get a grip
on
reality.
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
_______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
----- No virus found in this message. Checked by AVG - www.avg.com Version: 2015.0.6037 / Virus Database: 4365/10128 - Release Date: 06/30/15
The same way you do "File" -> "Save As" in your web browser. There are web services devoted to public web archival like those mentioned by Yongmin Hong. Adding links to archives in references is a very good advice; Wikipedia already supports appending links to archives in the ref template parameters. It would be nice to see this happening automatically and makes me wonder whether the community has discussed it before.
Le mar. 30 juin 2015 à 11:32, Peter Southwood peter.southwood@telkomsa.net a écrit :
How do you archive a website? I didn’t even know it was possible. Peter
-----Original Message----- From: wikimedia-l-bounces@lists.wikimedia.org [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Hong, Yongmin Sent: 30 June 2015 05:01 PM To: Wikimedia Mailing List Subject: Re: [Wikimedia-l] FindArticles.com died in 2012
Try to find links from https://archive.org/web/ , http://webcitation.org , or https://archive.is (note: this last one is blacklisted on enwp, iirc). I revived few deadlinks (not this host though) from these archives.
And try to make habit of archiving websites when you cite something. I think I once saw a project to automatically archive citation source links but not sure about current status.
Le mar. 30 juin 2015 à 4:41, Lilburne lilburne@tygers-of-wrath.net a écrit :
Lesson: the internet is ephemeral and the only permanent record is on physical material.
Digital media can be more technically demanding to work with, but I wouldn't say it is intrinsically ephemeral, much less that physical media is more enduring or even that it is the only trustworthy medium. Both hard and electronic information need a physical substratum, the difference is that digital computers make the substratum irrelevant, for they excell at making perfect copies to a different substratum with negligible cost. Networking and free/open formats enhence the advantage of digital media over physical ones.
The Web is to be blamed for the problem of dead links, not the Internet. This is particular to how URLs work. The copyright industry trolls can attest how difficult it is to kill resources identified by name or content, not location; for instance, using magnet links.
wikimedia-l@lists.wikimedia.org