Vinay from the Internet Archive asked me, with reference to http://meta.wikimedia.org/wiki/Help:Recent_changes :
Hi Sumana,
Is there someone I can contact regarding parsing out the URLs from the stream of recent changes? The idea being to grab the text of the recent change and extract out anything that looks like a URL and feed it into a queue at IA's end for archiving.
Looking at the Recent Changes feed, it looks like I'd need to parse the 'diff' page to find any new links, or in the case of 'new' pages, parse the new page to find all external links. Is there a better way? A live feed that includes the text that's changed for every article?
Thanks, Vinay
Vinay, #mediawiki on Freenode IRC, and possibly also the mediawiki-api mailing list, will be helpful to you.
Thanks all.