New subject: best practice for rate limits for accessing the English Wikipedia with the API

17 Nov 2010

Hi everyone,

I plan to be making a fair number of read access calls to the Wikipedia 
API over the next several months and would like to know what the best 
practices for efficient, fast access that doesn't hog resources.  I've 
found that having a single thread that makes a call and waits for a 
response before making the next call has been extremely reliable (much 
more so
than basically any other web API I've used before).   What I'd like to 
do make my application multithreaded for reading from the Wikipedia and 
make simultaneous calls to the Wikipedia (since the speed of my 
application is limited by the rate at which I can read from the Wikipedia.)

I have the following questions:

1) What limits should I observe in terms of number of calls I make per 
second and how many calls I should have going simultaneously?

2) How would I know when I'm accessing the API too quickly or too 
often?  I read at http://www.mediawiki.org/wiki/API:Errors_and_warnings 
that there is ratelimited  error message, but so far, I've not seen that 
error myself.  If I don't get a ratelimited error, does that mean I'm 
doing ok with respect to being a good API citizen.

3) Even if I am requiring read access, should I identify myself 
explicitly to the API by logging in for the read access -- so that I can 
be contacted should there be a problem?

4) Does it make sense to try to obtain bot privileges (even for read 
only access)?  My understanding is that bots get access to larger 
payload in some API calls.

Note: since I'm looking at recent changes to the Wikipedia, downloading 
a data dump of the Wikipeida to work on doesn't help me.

Thanks,

-Raymond Yee

(User:RaymondYee)