ArchiveLinks was created as a GSoC project to address the problem of linkrot on Wikipedia. In articles we often cite or link to external URLs, but anything could happen to content on other sites -- if they move, change, or simply vanish, the value of the citation is lost. ArchiveLinks rewrites external links in Wikipedia articles, so there is a '[cached]' link immediately afterwards which points to the web archiving service of your choice. This can even preserve the exact time that the link was added, so for sites which archive multiple versions of content (such as the Internet Archive) it will even link to a copy of the page that was made around the time the article was written.
Next, ArchiveLinks also publishes a feed via the API of recently added external links, so your favorite remote service can crawl those in a timely fashion. We have been talking with the Internet Archive about this; they are eager to get a list of the recent external links from Wikipedia since they believe our community will probably be linking to some of the most important and useful content on the web.
ArchiveLinks also contains a simple spidering system if you want to cache the links yourself, and display them through MediaWiki.
We completed almost all of our planned features (https://secure.wikimedia.org/wikipedia/mediawiki/wiki/User:Kevin_Brown/Archi...) and the next step is to campaign to get this adopted on Wikipedia. A lot of people are enthusiastic about the concept, but it is likely we will get more input on exactly what the "cached" link should look like, and it will take some time to get a security review. At the same time, we are working with the Internet Archive to set up a test site for them to crawl the feed (perhaps from Toolserver, before it is deployed on Wikipedia). Once the feed is setup on the toolserver the Internet Archive will start archiving all links that appear on the feed. This will effectively leave producing the cached link on the deployed version of mediawiki as the last step to fixing linkrot in all places where it is possible.
(Thanks to Neil Kandalgaonkar for writing the majority of this email).
As an addendum I'd like to say that I plan on having the feed available on the toolserver by the end of this week. The feed will be produced by copying stuff using the external link table via a cronjob.
From: Kevin Brown Sent: Monday, September 19, 2011 1:02 PM To: wikitech-l@lists.wikimedia.org Subject: Status Update on Archive Links Extension
ArchiveLinks was created as a GSoC project to address the problem of linkrot on Wikipedia. In articles we often cite or link to external URLs, but anything could happen to content on other sites -- if they move, change, or simply vanish, the value of the citation is lost. ArchiveLinks rewrites external links in Wikipedia articles, so there is a '[cached]' link immediately afterwards which points to the web archiving service of your choice. This can even preserve the exact time that the link was added, so for sites which archive multiple versions of content (such as the Internet Archive) it will even link to a copy of the page that was made around the time the article was written.
Next, ArchiveLinks also publishes a feed via the API of recently added external links, so your favorite remote service can crawl those in a timely fashion. We have been talking with the Internet Archive about this; they are eager to get a list of the recent external links from Wikipedia since they believe our community will probably be linking to some of the most important and useful content on the web.
ArchiveLinks also contains a simple spidering system if you want to cache the links yourself, and display them through MediaWiki.
We completed almost all of our planned features (https://secure.wikimedia.org/wikipedia/mediawiki/wiki/User:Kevin_Brown/Archi...) and the next step is to campaign to get this adopted on Wikipedia. A lot of people are enthusiastic about the concept, but it is likely we will get more input on exactly what the "cached" link should look like, and it will take some time to get a security review. At the same time, we are working with the Internet Archive to set up a test site for them to crawl the feed (perhaps from Toolserver, before it is deployed on Wikipedia). Once the feed is setup on the toolserver the Internet Archive will start archiving all links that appear on the feed. This will effectively leave producing the cached link on the deployed version of mediawiki as the last step to fixing linkrot in all places where it is possible.
(Thanks to Neil Kandalgaonkar for writing the majority of this email).
wikitech-l@lists.wikimedia.org