Sorry for the late reply.

On Sat, Nov 2, 2019 at 12:31 PM Andra Waagmeester <andra@micel.io> wrote:
Thanks for your prompt response. I wasn't filtering for 429, but only for 503, so that might explain it. 
This is my current countermeasure against overloading the system:

With only a quick look at the code, it looks good enough to me. A few things you might want to improve:

* L1148 [1]: use a default retry_after of 60 seconds instead of 30. That's the upper bound of what our throttling will ask you
* L1186-L1189: in case of 429, you can check the "retry-after" header to get a sleep value that will be what our throttling will expect


If you follow all that, you should be good. If you still see throttling / ban, let us know. If you give me the User-Agent of your script and the time at which you received the throttling / ban response, and I can have a look into the logs.

Where do I let you know? Is this email list the right place to do so?

This list is the right place. Or you can contact me directly if you want. But others might benefit from this discussion being public.

[1] https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1148
[2] https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1186-L1189


Wikidata-tech mailing list

Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation