Why are you making so queries? Have you tried batching pages together?
What kind of project needs a real-time copy of a large data set?
On Wed, May 8, 2019 at 2:49 PM Aadithya C Udupa <udupa.adithya(a)gmail.com>
wrote:
Thank you for the quick response, Michael.
I was making close to 10 requests per second previously. But would hit
the HTTP 429 errors frequently. In the etiquette document here
<https://www.mediawiki.org/wiki/API:Etiquette>, it suggested we make
requests in serial manner rather than parallel. Hence started making
requests in serial manner and one request per second, as I did not want to
abuse the API. But as you can imagine it takes up a lot of time, especially
when trying to expand to multiple languages.
Also, I send a valid User-Agent header as described here
<https://meta.wikimedia.org/wiki/User-Agent_policy>.
What do you think could be other reasons why I hit the HTTP 429 error?
Is there a cap on total number of requests per day/week etc.?
On Wed, May 8, 2019 at 10:43 AM Michael Holloway <
mholloway(a)wikimedia.org> wrote:
> Hi Aadithya,
>
> According to the information on the top of the REST API docs page
> <https://wikimedia.org/api/rest_v1/>, you should in general be able
> to make up to 200 read requests per second to the REST API without any
> trouble. As far as I know, that information is accurate. Are you hitting
> 429s at a lower request rate than that?
>
> To answer your question, sending requests in parallel to multiple
> language subdomains should not be a problem so long as your overall request
> rate remains lower than ~200/s.
>
> On Tue, May 7, 2019 at 8:27 PM Aadithya C Udupa <
> udupa.adithya(a)gmail.com> wrote:
>
>> Hi,
>> For one of my projects, I need to be able to keep the most up to date
>> version of wikipedia html pages for a few languages like en, zh, de, es, fr
>> etc. So this is done currently in two steps,
>> 1. Listen to changes on stream API documented here
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams> and
>> then extract the page titles.
>> 2. For each of the titles, get the latest HTML using the Wikipedia
>> REST api
>>
<https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_title__title_> and
>> persist the HTML.
>>
>> I understand that in order to avoid the 429 (Too many requests
>> error), we need to make sure we limit the api request to 1 per second. Just
>> wanted to check if we can make requests to different languages like
>>
en.wikipedia.org,
fr.wikipedia.org etc in parallel or do those
>> requests also need to be done in serial manner (1 per second), in order to
>> not hit HTTP 429 error.
>>
>> Please let me know if you need more information.
>>
>>
>> --
>> Regards,
>> Aadithya
>> --
>> Sent from my iPad3
>> _______________________________________________
>> Mediawiki-api mailing list
>> Mediawiki-api(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>
>
> --
> Michael Holloway
> Software Engineer, Reading Infrastructure
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
--
Regards,
Aadithya
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org