Hi all,
I just caused another small webrequest log data loss. I merged a change that was supposed to have no effect, but unfortunately it did. Between 21:54 and 22:15 UTC today. A puppet change was merged in which an important firewall rule dealing with IPSec was lost. This kept all varnishkafkas in remote datacenters from producing to Kafka during this time.
I have documented this here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#Changes_and_kn...
Apologies to all!
-Andrew Otto
---------- Forwarded message ---------- From: Marcel Ruiz Forns mforns@wikimedia.org Date: Wed, Dec 16, 2015 at 10:29 AM Subject: [Analytics] [Outage] Small data loss in raw_webrequest on 2015-12-15 To: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics." analytics@lists.wikimedia.org
Hi Analytics,
Yesterday, Dec 15, during the course of 1 hour (17h to 18h UTC) there was an irrecoverable raw_webrequest data loss of ~30%: 25.6% (misc), 19.5% (mobile), 19.1% (text), 39.1% (upload). This represents around 1% of the data for that day.
The loss was due to the enabling of IPSec, which encrypts varniskafka traffic between caches in remote datacenters and the Kafka brokers in eqiad. During a period of about 40ish minutes, no webrequest logs from remote datacenters were successfully produced to Kafka.
Here's the outage note: https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#Changes_and_kn... Sorry for the inconvenience.