On 17.11.2010, 16:39 Raymond wrote:
Hi everyone,
I plan to be making a fair number of read access calls
to the Wikipedia
API over the next several months and would like to know what the best
practices for efficient, fast access that doesn't hog resources. I've
found that having a single thread that makes a call and waits for a
response before making the next call has been extremely reliable (much
more so
than basically any other web API I've used before). What I'd like to
do make my application multithreaded for reading from the Wikipedia and
make simultaneous calls to the Wikipedia (since the speed of my
application is limited by the rate at which I can read from the Wikipedia.)
I have the following questions:
1) What limits should I observe in terms of number of
calls I make per
second
Per Domas, fewer requests with larger limits is better.
and how many calls I should have going simultaneously?
One.
2) How would I know when I'm accessing the API too
quickly or too
often? I read at
http://www.mediawiki.org/wiki/API:Errors_and_warnings
that there is ratelimited error message, but so far, I've not seen that
error myself. If I don't get a ratelimited error, does that mean I'm
doing ok with respect to being a good API citizen.
Rate limits are for editing and logging in only.
3) Even if I am requiring read access, should I
identify myself
explicitly to the API by logging in for the read access -- so that I can
be contacted should there be a problem?
Logging in can help you to get a higher limit if you're a sysop or
bot. However, identifying you with user-agent header is much more
important.
4) Does it make sense to try to obtain bot privileges
(even for read
only access)? My understanding is that bots get access to larger
payload in some API calls.
See above. If your bot will generate a significant load, it's always
better to consult the sysadmins and the bot approvals group (for English
Wikipedia). The more details you provide, the more precise will be the
answer.
--
Best regards,
Max Semenik ([[User:MaxSem]])