Hello Researchers,
I've been playing with Recent Changes Stream Interface
<https://wikitech.wikimedia.org/wiki/RCStream> recently, and have started
trying to use the API's "*action=compare*" to look at every diff of every
wiki in real time. The goal is to produce real-time analytics on the
content that's being added or deleted. The only problem is that is will
really hammer the API with lots of reads since it doesn't have a batch
interface. Can I spawn multiple network threads and do 10+ reads per second
forever without the API complaining? Can I warn someone about this and get
a special exemption for research purposes?
The other thing to do would be to use "*action=query*" to get the revisions
in batches and do the diffing myself, but then i'm not guaranteed to be
diffing in the same way that the site is.
What techniques would you recommend?
Make a great day,
Max Klein ‽
http://notconfusing.com/