Hello, I am extracting wikipedia articles via mediawiki API (example : http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Olive&... ) and it's working quite well most of the time but sometimes the API makes a delay to answer or worse I got no response at all from the API and my request fall into timeout (tried many different CURL timeout params to resolve it but nothing is absolutely safe (ie : 2 retries + 7 sec exec timeout) ...
The problem occurs randomly on any size of article (big or small). And I noticed that after a failure it always work with a second manual attempt...
TECH: I am using package SxWiki (SxWiki.inc.php) + CURL methods included CURL ERROR: Operation timed out after 7000 milliseconds with 0 bytes received
*** Do you know that kind of problems with the API ? Wikipedia API overload ??
Im trying to find a solution for weeks now and no method is 100% reliable! Thank you for your help. Oskar
2011/2/21 miguel oskar.wild@gmail.com:
Do you know that kind of problems with the API ? Wikipedia API overload ?? Im trying to find a solution for weeks now and no method is 100% reliable!
The results of parsing the latest version of a page are cached in the parser cache. If you call action=parse on a page and the result is found in the cache, the response will be very fast. If it's not in cache, however, the API will have to parse the page there and then, which takes some time for larger pages. On your side, CURL gives up after 7 seconds (quite a low timeout for this purpose, IMO), but on our side, the API continues crunching and at some point finishes parsing and stores the result in cache. Then when you retry the request, you'll be served that cached response almost instantaneously.
In short: parse results are usually served from cache (fast), but can be slow to generate when not in cache. In the latter case, be patient :)
Roan Kattouw (Catrope)
mediawiki-api@lists.wikimedia.org