Hi,
between 5:30 and 7:10 UTC on 2023/09/27 the WDQS servers running in the eqiad datacenter have returned results with more than 10 minutes of lag. This caused bots using MW maxlag [0] to stop functioning properly during that time. The incident started after a failure of the mirroring system between two of our kafka clusters [1], such incident should not have impacted WDQS but it uncovered improper sandboxing of the WDQS updater test setup [2].
Sorry for the inconvenience.
-- David Causse Software Engineer, Wikimedia Foundation
0: https://www.mediawiki.org/wiki/Manual:Maxlag_parameter 1: https://wikitech.wikimedia.org/wiki/Incidents/2023-09-27_Kafka-jumbo_mirror-... 2: https://phabricator.wikimedia.org/T347515