[Wikitech-l] Archival for Web Citations (GSoC project)

1 Jun 2011

Hi, I'm Kevin Brown, a GSoC student this year. I live in Melbourne, Florida
and am attending  Brevard Community College. My previous projects include work
on bots on the English Wikipedia for tagging of uncategorized pages and new
page patrol cleanup.

Almost since the web’s inception, link rot has been a major problem. Web-based
content comes and goes, sometimes within a matter of hours. This presents a
major problem, both for users seeking to access this information and
for Wikipedia's
core content policy of verifiability. While Wikipedia policy does not
require users to use web citations, it is by far the most popular form of
citations, because they're easy for readers and editors to access.

To help solve this and ensure adherence to verifiability (WP:V), I plan to
create an archival system over the summer, so users can access all external
links even if they go down. This preemptive archival should effectively
solve the problem of linkrot, as long as the source site allows caching of
its content. The project aims to get something that "just works" without
user input/request and to seamlessly integrate with existing page parsing
and rendering. Such a system will allow users to focus on content
creation, rather
than the distracting technical aspects of archival.

I would appreciate your help with the project.  Specifically, I'd appreciate
it if communites could start discussing this on your project's local village
pump, so that we can start developing consensus for deployment.
Also, please feel free to email me or find me on IRC under the nick kevin_brown
regarding any questions you may have.

I am currently drafting proposal and design documents and will be linking
them as they become available.  For now, please see a few relevant
proposals:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_External_links/Webcitebo…
http://en.wikipedia.org/wiki/Wikipedia_talk:Link_rot#Proposal_for_new_WikiP…
http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/WebCiteBOT
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Proposals/Dead_L…

(Thanks to Neil and Sumana for helping me write this.)

Best,
Kevin

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Archival for Web Citations (GSoC project)