Hi,
I have an application that every five minutes or so comes up with a list of 50-500 Wikipedia individual documents that it wants to request via curl. It wants to get them as quickly and as completely as possible with minimal missed documents or dropped connections.
1) what is the etiquette for how to do this politely?
2) what are the best practice for using curl to submit API requests?
* is it best to use a single curl command with a long list of URLs to get, or a long list of individual curl commands? * compressed or not? * retry values? * max-redirs? * other parameters?
FredZ ----------------------------------------------------- Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for monthly updates
Hi,
- what is the etiquette for how to do this politely?
See http://www.mediawiki.org/wiki/API:Etiquette.
- what are the best practice for using curl to submit API requests?
- is it best to use a single curl command with a long list of URLs to get,
or a long list of individual curl commands?
I think that doesn't matter much. What does matter is how many API requests you make. It's better if you use smaller amount of requests, by combining them into one query. For example, to get the text of two pages in one query, you would use URL like
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop...
- compressed or not?
I think you should always use compression if the server supports it. And MediaWiki API does.
- retry values?
That depends on why are you retrying. If it's because of maxlag, the recommendation is at least 5 seconds. But I think exponential backoff is even better.
- max-redirs?
The API shouldn't redirect you, as far as I know.
- other parameters?
They probably don't matter much. But as the Etiquette page notes, you probably shouldn't be making request in parallel.
Petr Onderka [[en:User:Svick]]
mediawiki-api@lists.wikimedia.org