Thank you Michael. That was helpful. Will reach out to ops team.
On Thu, May 9, 2019 at 6:24 AM Michael Holloway mholloway@wikimedia.org wrote:
Aadithya,
About title batching , you're not missing anything — unlike the action api (/w/api.php), the REST API (/api/rest_v1) page content endpoints take only a single title at a time.
It sounds like you may indeed be running into some periodic rate limit. The best source of info on current rate limits are the Traffic engineers on the Site Reliability Engineering https://www.mediawiki.org/wiki/Wikimedia_Site_Reliability_Engineering team; I'm not sure if any of them are subscribed to this list. You may have better luck asking on the Operations mailing list ( ops@lists.wikimedia.org) or the #wikimedia-operations channel on IRC (irc://irc.freenode.net/wikimedia-operations).
On Wed, May 8, 2019 at 5:20 PM Aadithya C Udupa udupa.adithya@gmail.com wrote:
Hi, I am making the queries to get the latest HTML content for a title. I am using the API documented here - https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_html__title_ and I may be missing something, but I do not see an option to send a list of titles. Also I am working on a project to do some semistructured and unstructured data extractions from wikipedia html.
On Wed, May 8, 2019 at 1:23 PM Betacommand Betacommand@gmail.com wrote:
Why are you making so queries? Have you tried batching pages together? What kind of project needs a real-time copy of a large data set?
On Wed, May 8, 2019 at 2:49 PM Aadithya C Udupa udupa.adithya@gmail.com wrote:
Thank you for the quick response, Michael. I was making close to 10 requests per second previously. But would hit the HTTP 429 errors frequently. In the etiquette document here https://www.mediawiki.org/wiki/API:Etiquette, it suggested we make requests in serial manner rather than parallel. Hence started making requests in serial manner and one request per second, as I did not want to abuse the API. But as you can imagine it takes up a lot of time, especially when trying to expand to multiple languages. Also, I send a valid User-Agent header as described here https://meta.wikimedia.org/wiki/User-Agent_policy. What do you think could be other reasons why I hit the HTTP 429 error? Is there a cap on total number of requests per day/week etc.?
On Wed, May 8, 2019 at 10:43 AM Michael Holloway < mholloway@wikimedia.org> wrote:
Hi Aadithya,
According to the information on the top of the REST API docs page https://wikimedia.org/api/rest_v1/, you should in general be able to make up to 200 read requests per second to the REST API without any trouble. As far as I know, that information is accurate. Are you hitting 429s at a lower request rate than that?
To answer your question, sending requests in parallel to multiple language subdomains should not be a problem so long as your overall request rate remains lower than ~200/s.
On Tue, May 7, 2019 at 8:27 PM Aadithya C Udupa < udupa.adithya@gmail.com> wrote:
Hi, For one of my projects, I need to be able to keep the most up to date version of wikipedia html pages for a few languages like en, zh, de, es, fr etc. So this is done currently in two steps,
- Listen to changes on stream API documented here
https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams and then extract the page titles. 2. For each of the titles, get the latest HTML using the Wikipedia REST api https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_title__title_ and persist the HTML.
I understand that in order to avoid the 429 (Too many requests error), we need to make sure we limit the api request to 1 per second. Just wanted to check if we can make requests to different languages like en.wikipedia.org, fr.wikipedia.org etc in parallel or do those requests also need to be done in serial manner (1 per second), in order to not hit HTTP 429 error.
Please let me know if you need more information.
-- Regards, Aadithya -- Sent from my iPad3 _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
-- Michael Holloway Software Engineer, Reading Infrastructure _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
-- Regards, Aadithya _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
-- Regards, Aadithya _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
-- Michael Holloway Software Engineer, Reading Infrastructure _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api