Hi,
the m2 master crashed today (investigation still ongoing), and caused EventLogging to not be able to write events to the database on 2014-11-18 between 14:14 and 15:02.
The data for that period is not lost, but is available in backup files, waiting to get injected again into the database.
Just wanted to let you know what happened, in case you notice drops in graphs / dashboards during that period.
Sorry for the inconveniences, Christian
It looks like the same problem is happening now. No new events have been written to the log tables on analytics-store for about the past hour and a half. And it looks like the slave db stopped replicating about 6 hours ago.
Ryan Kaldari
On Tue, Nov 18, 2014 at 9:31 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
the m2 master crashed today (investigation still ongoing), and caused EventLogging to not be able to write events to the database on 2014-11-18 between 14:14 and 15:02.
The data for that period is not lost, but is available in backup files, waiting to get injected again into the database.
Just wanted to let you know what happened, in case you notice drops in graphs / dashboards during that period.
Sorry for the inconveniences, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Note that it appears that enwiki on analytics-slave appears to be up to date.
[enwiki]> SELECT TIMEDIFF(NOW(), MAX(rc_timestamp)) FROM recentchanges; +------------------------------------+ | TIMEDIFF(NOW(), MAX(rc_timestamp)) | +------------------------------------+ | 00:00:00.000000 | +------------------------------------+ 1 row in set (0.01 sec)
But log seems to be 1 hour and 42 minutes behind.
[log]> select timediff(NOW(), max(timestamp)) from PageContentSaveComplete_5588433; +---------------------------------+ | timediff(NOW(), max(timestamp)) | +---------------------------------+ | 01:41:56.000000 | +---------------------------------+ 1 row in set (0.00 sec)
On Tue, Nov 18, 2014 at 1:48 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
It looks like the same problem is happening now. No new events have been written to the log tables on analytics-store for about the past hour and a half. And it looks like the slave db stopped replicating about 6 hours ago.
Ryan Kaldari
On Tue, Nov 18, 2014 at 9:31 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
the m2 master crashed today (investigation still ongoing), and caused EventLogging to not be able to write events to the database on 2014-11-18 between 14:14 and 15:02.
The data for that period is not lost, but is available in backup files, waiting to get injected again into the database.
Just wanted to let you know what happened, in case you notice drops in graphs / dashboards during that period.
Sorry for the inconveniences, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Ryan,
On Tue, Nov 18, 2014 at 11:48:55AM -0800, Ryan Kaldari wrote:
It looks like the same problem is happening now. No new events have been written to the log tables on analytics-store for about the past hour and a half.
Yes, there were at least (maybe more. I'll do a check and full write-up once I've gotten some sleep) EventLogging issues for
* 2014-11-18T18:15 -- 2014-11-18T21:05, and * 2014-11-18T23:19 -- 2014-11-19T00:07.
Recent EventLogging changes caused the process that writes to the database to hit Python's recursion limit and cause issues.
EventLogging got rolled back to the previously deployed version (Thanks ori!), and is doing fine since.
Proper incident report to follow.
Sorry for the inconveniences, Christian
Thanks Christian.
On Nov 18, 2014, at 5:20 PM, Christian Aistleitner christian@quelltextlich.at wrote:
Hi Ryan,
On Tue, Nov 18, 2014 at 11:48:55AM -0800, Ryan Kaldari wrote: It looks like the same problem is happening now. No new events have been written to the log tables on analytics-store for about the past hour and a half.
Yes, there were at least (maybe more. I'll do a check and full write-up once I've gotten some sleep) EventLogging issues for
- 2014-11-18T18:15 -- 2014-11-18T21:05, and
- 2014-11-18T23:19 -- 2014-11-19T00:07.
Recent EventLogging changes caused the process that writes to the database to hit Python's recursion limit and cause issues.
EventLogging got rolled back to the previously deployed version (Thanks ori!), and is doing fine since.
Proper incident report to follow.
Sorry for the inconveniences, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi,
On Wed, Nov 19, 2014 at 02:20:20AM +0100, Christian Aistleitner wrote:
Proper incident report to follow.
https://wikitech.wikimedia.org/wiki/Incident_documentation/20141118-EventLog...
Have fun, Christian