Hi all,

As a part of the ongoing work to ensure that Wikidata Query Service (WDQS) continues to be available and functional for users, we have implemented a Service Level Objective (SLO) for WDQS uptime. We currently aim to maintain a 95% uptime based on a 90 day rolling window, in keeping with Wikimedia Foundation (WMF) SLO reporting standards.

Effectively this means that the WMF will be responsible, along with Wikimedia Deutschland (WMDE), for making sure that WDQS is available at least 95% of the time, which equates to the following acceptable amounts of downtime:

Daily: 1h 12m
Weekly: 8h 24m
Monthly: 1d 12h 13m 27s
Quarterly: 4d 12h 40m 22s
Yearly: 18d 2h 41m 28s

It also means that we can focus on other priorities if we are meeting this SLO. We believe that this should help formalize our commitment to ensuring WDQS is available for users while still making time to work on long-term scaling initiatives for the future of WDQS. It will also formalize the limitations of what we are able to support, allowing us to avoid being overly reactive to fluctuations in inherently unstable system performance in a way that has previously required us to wake people up on weekends to resolve.

The current status of the SLO is available here:
https://grafana.wikimedia.org/d/l-3CMlN4z/wdqs-uptime-slo?orgId=1
The gauge (top left) indicates the current WDQS uptime over the past 90 days. The WDQS Uptime SLO graph (top right) indicates the historic point-in-time uptime based on user traffic, with the red horizontal line at our SLO of 95%. For specifics on how our uptime metric is computed, see our WDQS SLO documentation.

As always, we appreciate your patience as we work on improving WDQS. If you have further questions about the new uptime SLO, please don’t hesitate to respond to this email.

Best,
Ryan Kemper
Site Reliability Engineer, Search Platform