Hi, I'm Kevin Brown, a GSoC student this year. I live in Melbourne, Florida and am attending Brevard Community College. My previous projects include work on bots on the English Wikipedia for tagging of uncategorized pages and new page patrol cleanup.
Almost since the web’s inception, link rot has been a major problem. Web-based content comes and goes, sometimes within a matter of hours. This presents a major problem, both for users seeking to access this information and for Wikipedia's core content policy of verifiability. While Wikipedia policy does not require users to use web citations, it is by far the most popular form of citations, because they're easy for readers and editors to access.
To help solve this and ensure adherence to verifiability (WP:V), I plan to create an archival system over the summer, so users can access all external links even if they go down. This preemptive archival should effectively solve the problem of linkrot, as long as the source site allows caching of its content. The project aims to get something that "just works" without user input/request and to seamlessly integrate with existing page parsing and rendering. Such a system will allow users to focus on content creation, rather than the distracting technical aspects of archival.
I would appreciate your help with the project. Specifically, I'd appreciate it if communites could start discussing this on your project's local village pump, so that we can start developing consensus for deployment. Also, please feel free to email me or find me on IRC under the nick kevin_brown regarding any questions you may have.
I am currently drafting proposal and design documents and will be linking them as they become available. For now, please see a few relevant proposals: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_External_links/Webcitebot... http://en.wikipedia.org/wiki/Wikipedia_talk:Link_rot#Proposal_for_new_WikiPr... http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/WebCiteBOT http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Proposals/Dead_Li...
(Thanks to Neil and Sumana for helping me write this.)
Best, Kevin
You might want to dig into French Wikipedia. IIRC They run a link archival service (there was discussion about enabling it for English Wikipedia, but I don't think it came to anything) and might have some helpful material.
I forget the name I'm afraid, it's discussed somewhere on the en.wiki Village Pump so I'll see if I can dig it out.
Tom Morton
On 1 Jun 2011, at 21:51, foo bar nnwiki@gmail.com wrote:
Hi, I'm Kevin Brown, a GSoC student this year. I live in Melbourne, Florida and am attending Brevard Community College. My previous projects include work on bots on the English Wikipedia for tagging of uncategorized pages and new page patrol cleanup.
Almost since the web’s inception, link rot has been a major problem. Web-based content comes and goes, sometimes within a matter of hours. This presents a major problem, both for users seeking to access this information and for Wikipedia's core content policy of verifiability. While Wikipedia policy does not require users to use web citations, it is by far the most popular form of citations, because they're easy for readers and editors to access.
To help solve this and ensure adherence to verifiability (WP:V), I plan to create an archival system over the summer, so users can access all external links even if they go down. This preemptive archival should effectively solve the problem of linkrot, as long as the source site allows caching of its content. The project aims to get something that "just works" without user input/request and to seamlessly integrate with existing page parsing and rendering. Such a system will allow users to focus on content creation, rather than the distracting technical aspects of archival.
I would appreciate your help with the project. Specifically, I'd appreciate it if communites could start discussing this on your project's local village pump, so that we can start developing consensus for deployment. Also, please feel free to email me or find me on IRC under the nick kevin_brown regarding any questions you may have.
I am currently drafting proposal and design documents and will be linking them as they become available. For now, please see a few relevant proposals: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_External_links/Webcitebot... http://en.wikipedia.org/wiki/Wikipedia_talk:Link_rot#Proposal_for_new_WikiPr... http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/WebCiteBOT http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Proposals/Dead_Li...
(Thanks to Neil and Sumana for helping me write this.)
Best, Kevin _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikiwix, I think --
http://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archived_citatio... http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_External_links/Webci...
Kevin, you should check out the second link above for projects which are potentially similar to yours.
Pete
On 6/1/11 13:59 PM, Thomas Morton wrote:
You might want to dig into French Wikipedia. IIRC They run a link archival service (there was discussion about enabling it for English Wikipedia, but I don't think it came to anything) and might have some helpful material.
I forget the name I'm afraid, it's discussed somewhere on the en.wiki Village Pump so I'll see if I can dig it out.
Tom Morton
Welcome Kevin,
I tried to contact you a few days ago, but was unable to. Please create a wiki account (with email notifications enabled) and commit your USERINFO.
Hello
http://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archived_citatio...
if you want more explication you could contact me, we have build a solution to store external link.
It already use by fr.wikipedia and hu.wikipedia.
Cordialement Pascal Martin 06 13 89 77 32 02 32 40 23 69
----- Original Message ----- From: "Platonides" Platonides@gmail.com To: wikitech-l@lists.wikimedia.org Sent: Thursday, June 02, 2011 12:37 AM Subject: Re: [Wikitech-l] Archival for Web Citations (GSoC project)
Welcome Kevin,
I tried to contact you a few days ago, but was unable to. Please create a wiki account (with email notifications enabled) and commit your USERINFO.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org