Thanks for trying to not overload the service!
There is some minimal documentation on the throttling done by Wikidata
Query Service , but it clearly needs to be improved.
High level overview:
Throttling is done by "client". Where client in this case is identified by
user-agent and IP address (yes, it is a flawed definition of client, but it
mostly works for throttling purpose). Limits are set on the query execution
time and on the number of errors raised by the client. When the limits are
reached, an HTTP 429 response is sent to the client, with a "Retry-After"
HTTP header. This header contains an estimate of how long a client should
wait before retrying a request (in seconds). If we see a client that seems
to ignore HTTP 429 for long enough, that client is going to be banned for
What you can do:
* don't execute more than one request in parallel
* set a user-agent specific to your application (see  for some
documentation on the user-agent policy)
* when receiving an HTTP 429 response, stop for the duration of the
Retry-After header or for 1 minute
If you follow all that, you should be good. If you still see throttling /
ban, let us know. If you give me the User-Agent of your script and the time
at which you received the throttling / ban response, and I can have a look
into the logs.
Note that we might have some degenerated behaviour when the service is
already overloaded (I don't think so, but who knows).
On Sat, Nov 2, 2019 at 11:37 AM Andra Waagmeester <andra(a)micel.io> wrote:
I hope this is the right mailing list to discuss this issue.
Some time ago I ran into a series of temporary bans, I thought I managed
to tackle this basically by doing a full stop once it gets any response
header code other than 200.
However, this seems not to have fixed it, since I received the following
"requests.exceptions.HTTPError: 403 Client Error: You have been banned
until 2019-10-18T10:21:36.495Z, please respect throttling and retry-after
headers. for url: https://query.wikidata.org/sparql"
I am looking into this from scratch and see if I can implement a better
solution and certainly one that really respects the retry-after time
instead of going full stop.
Whatever I try now, I keep getting 200 headers and I don't want to start
an excessive bot run to get into a ban state to see the exact header that
the bot needs to respect.
Is there an example of such a header which I can use to make my own test
Or is there example python could that successfully deals with a
Wikidata-tech mailing list
Engineering Manager, Search Platform
UTC+2 / CEST