Hi,

between 5:30 and 7:10 UTC on 2023/09/27 the WDQS servers running in the eqiad datacenter have returned results with more than 10 minutes of lag. This caused bots using MW maxlag [0] to stop functioning properly during that time.
The incident started after a failure of the mirroring system between two of our kafka clusters [1], such incident should not have impacted WDQS but it uncovered improper sandboxing of the WDQS updater test setup [2].

Sorry for the inconvenience.

--
David Causse
Software Engineer, Wikimedia Foundation

0: https://www.mediawiki.org/wiki/Manual:Maxlag_parameter
1: https://wikitech.wikimedia.org/wiki/Incidents/2023-09-27_Kafka-jumbo_mirror-makers
2: https://phabricator.wikimedia.org/T347515