Hello all!
TL;DR: alert level on Wikidata Query Service have been increased, any
Icinga alert should now be treated seriously.
As you might know already, we're having trouble keeping up on updates
on the public Wikidata Query Service cluster. We're working on it, but
it is a hard problem. At the same time, known use cases of the public
WDQS endpoint don't depend on a short update lag.
As such, we have increased the alerting threshold on update lag for
this public cluster to 6h / 12h for WARNING / CRITICAL [1]. This does
not actually change the quality of service of WDQS public endpoints,
but somewhat aligns expectations and reality. It also means that all
alerts raised by WDQS should now be treated seriously and not ignored
as known issues with no immediate solution.
At the same time, we're having a conversation of what the service
level of that cluster should be [2]. Feel free to join that
conversation if you are impacted (or just if you have interesting
thoughts on the subject).
Thanks for your patience,
Guillaume
[1] https://gerrit.wikimedia.org/r/c/operations/puppet/+/470819
[2] https://phabricator.wikimedia.org/T199228
--
Guillaume Lederrey
Operations Engineer, Search Platform
Wikimedia Foundation
UTC+1 / CET
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata