Hi,
between 5:30 and 7:10 UTC on 2023/09/27 the WDQS servers running in the
eqiad datacenter have returned results with more than 10 minutes of lag.
This caused bots using MW maxlag [0] to stop functioning properly during
that time.
The incident started after a failure of the mirroring system between two of
our kafka clusters [1], such incident should not have impacted WDQS but it
uncovered improper sandboxing of the WDQS updater test setup [2].
Sorry for the inconvenience.
--
David Causse
Software Engineer, Wikimedia Foundation
0:
https://www.mediawiki.org/wiki/Manual:Maxlag_parameter
1:
https://wikitech.wikimedia.org/wiki/Incidents/2023-09-27_Kafka-jumbo_mirror…
2:
https://phabricator.wikimedia.org/T347515