We already store the external links.
Even before we did that I did a lot of processing with them.
On enwiki A good number are rotted, yes, but far more have been
replaced with (or always were) spam or spam-like content.
There is an absolutely insane number of them on the large wikis, it's
a tough problem.
On 3/27/06, Lars Aronsson <lars(a)aronsson.se> wrote:
Did anybody implement in MediaWiki a link rot checker for external
links? Here is a theory for how it could work:
When a new page is saved and the parser detects a "http:" pattern,
the external link is stored in a separate database table, with a
pointer to the wiki page where it was harvested. At regular
intervals, all external links are tried (HTTP GET) by a background
process and the success or failure rate is recorded. If a link
becomes unavailabe (HTTP ERROR) during three consequtive fetch
attempts, it gets listed on a special page of possibly broken
external links. Broken links from the same website could be
grouped together. Maybe the whole site is broken, has moved or
has been internally reorganized.