Hello all!
TL;DR: alert level on Wikidata Query Service have been increased, any Icinga alert should now be treated seriously.
As you might know already, we're having trouble keeping up on updates on the public Wikidata Query Service cluster. We're working on it, but it is a hard problem. At the same time, known use cases of the public WDQS endpoint don't depend on a short update lag.
As such, we have increased the alerting threshold on update lag for this public cluster to 6h / 12h for WARNING / CRITICAL [1]. This does not actually change the quality of service of WDQS public endpoints, but somewhat aligns expectations and reality. It also means that all alerts raised by WDQS should now be treated seriously and not ignored as known issues with no immediate solution.
At the same time, we're having a conversation of what the service level of that cluster should be [2]. Feel free to join that conversation if you are impacted (or just if you have interesting thoughts on the subject).
Thanks for your patience,
Guillaume
[1] https://gerrit.wikimedia.org/r/c/operations/puppet/+/470819 [2] https://phabricator.wikimedia.org/T199228
-- Guillaume Lederrey Operations Engineer, Search Platform Wikimedia Foundation UTC+1 / CET