As an addendum I'd like to say that I plan on having the feed available on
the toolserver by the end of this week. The feed will be produced by copying
stuff using the external link table via a cronjob.
From: Kevin Brown
Sent: Monday, September 19, 2011 1:02 PM
To: wikitech-l(a)lists.wikimedia.org
Subject: Status Update on Archive Links Extension
ArchiveLinks was created as a GSoC project to address the problem of linkrot
on Wikipedia. In articles we often cite or link to external URLs, but
anything could happen to content on
other sites -- if they move, change, or simply vanish, the value of the
citation is lost. ArchiveLinks rewrites external links in Wikipedia
articles, so there is a '[cached]' link immediately afterwards which points
to the web archiving service of your choice. This can even preserve the
exact time that the link was added, so for sites which archive multiple
versions of content (such as the Internet Archive) it will even link to a
copy of the page that was made around the time the article was written.
Next, ArchiveLinks also publishes a feed via the API of recently added
external links, so your favorite remote service can crawl those in a timely
fashion. We have been talking with the Internet Archive about this; they are
eager to get a list of the recent external links from Wikipedia since they
believe our community will probably be linking to some of the most important
and useful content on the web.
ArchiveLinks also contains a simple spidering system if you want to cache
the links yourself, and display them through MediaWiki.
We completed almost all of our planned features
(
https://secure.wikimedia.org/wikipedia/mediawiki/wiki/User:Kevin_Brown/Arch…)
and the next step is to campaign to get this adopted on Wikipedia. A lot of
people are enthusiastic about the concept, but it is likely we will get more
input on exactly what the "cached" link should look like, and it will take
some time to get a security review. At the same time, we are working with
the Internet Archive to set up a test site for them to crawl the feed
(perhaps from Toolserver, before it is deployed on Wikipedia). Once the feed
is setup on the toolserver the Internet Archive will start archiving all
links that appear on the feed. This will effectively leave producing the
cached link on the deployed version of mediawiki as the last step to fixing
linkrot in all places where it is possible.
(Thanks to Neil Kandalgaonkar for writing the majority of this email).