Hi everybody,
today, while performing maintenance to the Eventlogging Master database, we ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two hours of data inserted to the slave database and not the master one). We are working to find a feasible solution to avoid loosing data and getting out this inconsistent state, so as precautionary measure the Eventlogging mysql consumers have been stopped.
A couple of notes:
- The Eventlogging machinery is working as expected, except mysql insertion of course. - The HDFS data has not been affected by this issue.
Please check the task for more updates, or follow up with the Analytics team on IRC (#wikimedia-analytics on freenode).
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)
Update: data should now be recovered and everything back on track. Latest data might take a bit of time to catch up since we have just restarted the replication script on the analytics-slave.
All the details about the outage in https://phabricator. wikimedia.org/T188991
Thanks!
Luca
2018-03-06 13:55 GMT+01:00 Luca Toscano ltoscano@wikimedia.org:
Hi everybody,
today, while performing maintenance to the Eventlogging Master database, we ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two hours of data inserted to the slave database and not the master one). We are working to find a feasible solution to avoid loosing data and getting out this inconsistent state, so as precautionary measure the Eventlogging mysql consumers have been stopped.
A couple of notes:
- The Eventlogging machinery is working as expected, except mysql
insertion of course.
- The HDFS data has not been affected by this issue.
Please check the task for more updates, or follow up with the Analytics team on IRC (#wikimedia-analytics on freenode).
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)