On Tue, Aug 31, 2010 at 5:31 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
A real-time feed of external links is overkill. As mentioned by others, the chief problem is linkrot of old links. All we need to do is dump the contents of externallinks.el_to from the database once a year or so, run a hex to ASCII conversion on it, zip it, and email it to someone at the Internet Archive. Anyone with access to the databases should be able to do this fairly easily. Rather than trying to engineer a complicated system that will take a year to implement, why not take this simple approach that will take care of 90+% of the problem?
Ryan Kaldari
Why once a year? We already get a successful externallinks dump every dump cycle. Even the enwiki one is only half a month old[0]. If someone wants to work with Internet Archive or anyone else on this, the data is already there.
-Chad