Hi all,
As a part of the ongoing work to ensure that Wikidata Query Service (WDQS) continues to be available and functional for users, *we have implemented a Service Level Objective (SLO) for WDQS uptime*. We currently aim to maintain a *95% uptime* based on a *90 day rolling window*, in keeping with Wikimedia Foundation (WMF) SLO reporting standards https://wikitech.wikimedia.org/wiki/SLO#SLO_reporting.
Effectively this means that the WMF will be responsible, along with Wikimedia Deutschland (WMDE), for *making sure that WDQS is available at least 95% of the time*, which equates to the following acceptable amounts of downtime:
*Daily*: 1h 12m *Weekly*: 8h 24m *Monthly*: 1d 12h 13m 27s *Quarterly*: 4d 12h 40m 22s *Yearly*: 18d 2h 41m 28s
It also means that we can focus on other priorities if we are meeting this SLO. We believe that this should help formalize our commitment to ensuring WDQS is available for users while still making time to work on long-term scaling initiatives for the future of WDQS. It will also formalize the limitations of what we are able to support, allowing us to avoid being overly reactive to fluctuations in inherently unstable system performance in a way that has previously required us to wake people up on weekends to resolve.
The current status of the SLO is available here: https://grafana.wikimedia.org/d/l-3CMlN4z/wdqs-uptime-slo?orgId=1 The gauge (top left) indicates the current WDQS uptime over the past 90 days. The WDQS Uptime SLO graph (top right) indicates the historic point-in-time uptime based on user traffic, with the red horizontal line at our SLO of 95%. For specifics on how our uptime metric is computed, see our WDQS SLO documentation https://wikitech.wikimedia.org/wiki/SLO/WDQS#Service_Level_Indicators_(SLIs) .
As always, we appreciate your patience as we work on improving WDQS. If you have further questions about the new uptime SLO, please don’t hesitate to respond to this email.
Best, Ryan Kemper Site Reliability Engineer, Search Platform
Finally, some sanity and perfectly acceptable SLO downtimes for ... a FREE SERVICE TO THE WORLD.
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
On Thu, Feb 2, 2023 at 2:15 AM Ryan Kemper rkemper@wikimedia.org wrote:
Hi all,
As a part of the ongoing work to ensure that Wikidata Query Service (WDQS) continues to be available and functional for users, *we have implemented a Service Level Objective (SLO) for WDQS uptime*. We currently aim to maintain a *95% uptime* based on a *90 day rolling window*, in keeping with Wikimedia Foundation (WMF) SLO reporting standards https://wikitech.wikimedia.org/wiki/SLO#SLO_reporting.
Effectively this means that the WMF will be responsible, along with Wikimedia Deutschland (WMDE), for *making sure that WDQS is available at least 95% of the time*, which equates to the following acceptable amounts of downtime:
*Daily*: 1h 12m *Weekly*: 8h 24m *Monthly*: 1d 12h 13m 27s *Quarterly*: 4d 12h 40m 22s *Yearly*: 18d 2h 41m 28s
It also means that we can focus on other priorities if we are meeting this SLO. We believe that this should help formalize our commitment to ensuring WDQS is available for users while still making time to work on long-term scaling initiatives for the future of WDQS. It will also formalize the limitations of what we are able to support, allowing us to avoid being overly reactive to fluctuations in inherently unstable system performance in a way that has previously required us to wake people up on weekends to resolve.
The current status of the SLO is available here: https://grafana.wikimedia.org/d/l-3CMlN4z/wdqs-uptime-slo?orgId=1 The gauge (top left) indicates the current WDQS uptime over the past 90 days. The WDQS Uptime SLO graph (top right) indicates the historic point-in-time uptime based on user traffic, with the red horizontal line at our SLO of 95%. For specifics on how our uptime metric is computed, see our WDQS SLO documentation https://wikitech.wikimedia.org/wiki/SLO/WDQS#Service_Level_Indicators_(SLIs) .
As always, we appreciate your patience as we work on improving WDQS. If you have further questions about the new uptime SLO, please don’t hesitate to respond to this email.
Best, Ryan Kemper Site Reliability Engineer, Search Platform
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org