We already store the external links.
Even before we did that I did a lot of processing with them.
On enwiki A good number are rotted, yes, but far more have been replaced with (or always were) spam or spam-like content.
There is an absolutely insane number of them on the large wikis, it's a tough problem.
On 3/27/06, Lars Aronsson lars@aronsson.se wrote:
Did anybody implement in MediaWiki a link rot checker for external links? Here is a theory for how it could work:
When a new page is saved and the parser detects a "http:" pattern, the external link is stored in a separate database table, with a pointer to the wiki page where it was harvested. At regular intervals, all external links are tried (HTTP GET) by a background process and the success or failure rate is recorded. If a link becomes unavailabe (HTTP ERROR) during three consequtive fetch attempts, it gets listed on a special page of possibly broken external links. Broken links from the same website could be grouped together. Maybe the whole site is broken, has moved or has been internally reorganized.