Did anybody implement in MediaWiki a link rot checker for external
links? Here is a theory for how it could work:
When a new page is saved and the parser detects a "http:" pattern,
the external link is stored in a separate database table, with a
pointer to the wiki page where it was harvested. At regular
intervals, all external links are tried (HTTP GET) by a background
process and the success or failure rate is recorded. If a link
becomes unavailabe (HTTP ERROR) during three consequtive fetch
attempts, it gets listed on a special page of possibly broken
external links. Broken links from the same website could be
grouped together. Maybe the whole site is broken, has moved or
has been internally reorganized.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se