Hi all,
As a part of the ongoing work to ensure that Wikidata Query Service (WDQS)
continues to be available and functional for users, *we have implemented a
Service Level Objective (SLO) for WDQS uptime*. We currently aim to
maintain a *95% uptime* based on a *90 day rolling window*, in keeping with
Wikimedia Foundation (WMF) SLO reporting standards
<https://wikitech.wikimedia.org/wiki/SLO#SLO_reporting>.
Effectively this means that the WMF will be responsible, along with
Wikimedia Deutschland (WMDE), for *making sure that WDQS is available at
least 95% of the time*, which equates to the following acceptable amounts
of downtime:
*Daily*: 1h 12m
*Weekly*: 8h 24m
*Monthly*: 1d 12h 13m 27s
*Quarterly*: 4d 12h 40m 22s
*Yearly*: 18d 2h 41m 28s
It also means that we can focus on other priorities if we are meeting this
SLO. We believe that this should help formalize our commitment to ensuring
WDQS is available for users while still making time to work on long-term
scaling initiatives for the future of WDQS. It will also formalize the
limitations of what we are able to support, allowing us to avoid being
overly reactive to fluctuations in inherently unstable system performance
in a way that has previously required us to wake people up on weekends to
resolve.
The current status of the SLO is available here:
https://grafana.wikimedia.org/d/l-3CMlN4z/wdqs-uptime-slo?orgId=1
The gauge (top left) indicates the current WDQS uptime over the past 90
days. The WDQS Uptime SLO graph (top right) indicates the historic
point-in-time uptime based on user traffic, with the red horizontal line at
our SLO of 95%. For specifics on how our uptime metric is computed,
see our WDQS
SLO documentation
<https://wikitech.wikimedia.org/wiki/SLO/WDQS#Service_Level_Indicators_(SLIs)>
.
As always, we appreciate your patience as we work on improving WDQS. If you
have further questions about the new uptime SLO, please don’t hesitate to
respond to this email.
Best,
Ryan Kemper
Site Reliability Engineer, Search Platform