Hi, I'm Kevin Brown, a GSoC student this year. I live in Melbourne, Florida
and am attending Brevard Community College. My previous projects include work
on bots on the English Wikipedia for tagging of uncategorized pages and new
page patrol cleanup.
Almost since the web’s inception, link rot has been a major problem. Web-based
content comes and goes, sometimes within a matter of hours. This presents a
major problem, both for users seeking to access this information and
core content policy of verifiability. While Wikipedia policy does not
require users to use web citations, it is by far the most popular form of
citations, because they're easy for readers and editors to access.
To help solve this and ensure adherence to verifiability (WP:V), I plan to
create an archival system over the summer, so users can access all external
links even if they go down. This preemptive archival should effectively
solve the problem of linkrot, as long as the source site allows caching of
its content. The project aims to get something that "just works" without
user input/request and to seamlessly integrate with existing page parsing
and rendering. Such a system will allow users to focus on content
than the distracting technical aspects of archival.
I would appreciate your help with the project. Specifically, I'd appreciate
it if communites could start discussing this on your project's local village
pump, so that we can start developing consensus for deployment.
Also, please feel free to email me or find me on IRC under the nick kevin_brown
regarding any questions you may have.
I am currently drafting proposal and design documents and will be linking
them as they become available. For now, please see a few relevant
(Thanks to Neil and Sumana for helping me write this.)