Hello Researchers,

I've been playing with Recent Changes Stream Interface recently, and have started trying to use the API's "action=compare" to look at every diff of every wiki in real time. The goal is to produce real-time analytics on the content that's being added or deleted. The only problem is that is will really hammer the API with lots of reads since it doesn't have a batch interface. Can I spawn multiple network threads and do 10+ reads per second forever without the API complaining? Can I warn someone about this and get a special exemption for research purposes?

The other thing to do would be to use "action=query" to get the revisions in batches and do the diffing myself, but then i'm not guaranteed to be diffing in the same way that the site is.

What techniques would you recommend?


Make a great day,
Max Klein ‽ http://notconfusing.com/