Hello all!
Following on this previous communication, the change to our throttling policy has been deployed yesterday (2017-10-23 17:00 UTC). Reviewing the logs so far, I don't see any change of pattern in the number of throttled requests. This means that mostly no one should be affected. Or at least not affected more then you already were.
Feel free to reach out to me if that's not the case.
Have fun!
Guillaume
On Thu, Oct 19, 2017 at 10:14 AM, Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
As you might have seen / endured, we've had a Wikdiata Query Service partial outage yesterday morning (central european time). The full incident report is available [1] if you are interested in the details. The short version:
- a single client started to run an unusually high number of queries on WDQS
- the overload was not prevented by our current throttling
- the failure was not detected and isolated automatically
To prevent this from happening again, we will review our throttling rules. Those rules were previously tuned to prevent a single client from overloading the service with a small number of expensive requests: we started to log a client activity only when the duration of a request exceeded 10 seconds. Which means that a client sending tons of short requests would never be throttled.
We will correct that by lowering the threshold to probably 25ms. The throttling rules are still the same:
- 60 seconds of processing time per minute (peaking at 120 seconds)
- 30 errors per minute (peaking at 60)
If you are using WDQS to make lots of small requests, and you are over the throttling rates above, there is a chance that you will start seeing throttling errors. We are not doing this to bother you, we're just trying to keep another crash from happening...
If you are throttled, you will receive an HTTP 429 error code. This response include the "Retry-After" HTTP header which specify a number of seconds you should wait before retrying.
Thanks for your patience!
And contact me if you want any clarification.
Guillaume
[1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20171018-wdqs [2] https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#429
-- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation UTC+2 / CEST