Hi Analytics Dev team,
On Thu, Mar 20, 2014 at 01:20:54PM -0700, Greg Grossmeier wrote:
<quote name="Ori Livneh" date="2014-03-20" time="03:52:01 -0700"> > [ At about 2014-03-18 00:04 UTC, db1047 stopped accepting incoming > connections. At some point during the subsequent hour, MariaDB had either > crashed or been manually restarted. Sean noticed that the database was > choking on some queries from the researchers and notified the wmfresearch > list.
Can someone from Analytics own this post-mortem and put it on the wiki: https://wikitech.wikimedia.org/wiki/Incident_documentation
Please add specific next steps (with bug#, RT#s, or gerrit urls), even (especially) things you haven't done yet and are just "nice to have".
it's been a week, and I cannot find the post-mortem Greg requested at the above URL :-/
Neither did I see a response from our team to Greg's email.
I lost track of our EventLogging responsibilities during the recent back and forth. So:
Toby, are we actually grabbing Greg's item or are we pushing back on it?
Best regards, Christian
P.S.: Toby, if we're grabbing it: I totally lack knowledge about both EventLogging, and the incident itself. So, be prepared for double slow start if I get to work on it.