Update: data should now be recovered and everything back on track. Latest data might take a bit of time to catch up since we have just restarted the replication script on the analytics-slave.

All the details about the outage in https://phabricator.wikimedia.org/T188991

Thanks!

Luca

2018-03-06 13:55 GMT+01:00 Luca Toscano <ltoscano@wikimedia.org>:
Hi everybody,

today, while performing maintenance to the Eventlogging Master database, we ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two hours of data inserted to the slave database and not the master one). We are working to find a feasible solution to avoid loosing data and getting out this inconsistent state, so as precautionary measure the Eventlogging mysql consumers have been stopped.

A couple of notes:

- The Eventlogging machinery is working as expected, except mysql insertion of course.
- The HDFS data has not been affected by this issue.

Please check the task for more updates, or follow up with the Analytics team on IRC (#wikimedia-analytics on freenode).

Thanks and sorry for the trouble!

Luca (on behalf of the Analytics team)