Hi all,
Christian -- thanks for following up on this.
I've created a ticket[1] for this issue as a production issue. Kevin -- please triage tomorrow in standup. We can own the actual incident report but we'll need to get some help from Ori in understanding how to perform the post mortem.
The current status for EventLogging support is that Ori, the Analytics team, the Operations team and the Platform teams are discussing the handover of EventLogging. The Analytics team will own EventLogging as soon as we can, but we need to get consensus on the details.
I've written up our discussions on this wiki page[2]. Please feel free to add/discuss. We've had some preliminary discussions with Andrew Otto but need to follow up with Rob and Ori.
-Toby
[1] https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/1526 [1] https://www.mediawiki.org/wiki/Analytics/EventLogging
On Thu, Apr 3, 2014 at 6:27 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Toby,
and zooooooooom ... there goes another week without us even deciding whether or not we feel responsible doing the incident documentation and follow-up work. :-D
I feel somewhat embarrassed that after two weeks, and after the ping on mailing lists, we still did not yet manage to tell Greg at least whether or not we'll work on it.
So,--if you do not chime in/push back by then--I'll be bold and I'll consider our given lip service around EventLogging a commitment and start working on it on Monday (2014-04-07).
Best regards, Christian
On Thu, Mar 27, 2014 at 06:58:27PM +0100, Christian Aistleitner wrote:
Hi Analytics Dev team,
On Thu, Mar 20, 2014 at 01:20:54PM -0700, Greg Grossmeier wrote:
<quote name="Ori Livneh" date="2014-03-20" time="03:52:01 -0700"> > [ At about 2014-03-18 00:04 UTC, db1047 stopped accepting incoming > connections. At some point during the subsequent hour, MariaDB had
either
crashed or been manually restarted. Sean noticed that the database
was
choking on some queries from the researchers and notified the
wmfresearch
list.
Can someone from Analytics own this post-mortem and put it on the wiki: https://wikitech.wikimedia.org/wiki/Incident_documentation
Please add specific next steps (with bug#, RT#s, or gerrit urls), even (especially) things you haven't done yet and are just "nice to have".
it's been a week, and I cannot find the post-mortem Greg requested at the above URL :-/
Neither did I see a response from our team to Greg's email.
I lost track of our EventLogging responsibilities during the recent back and forth. So:
Toby, are we actually grabbing Greg's item or are we pushing back on it?
Best regards, Christian
P.S.: Toby, if we're grabbing it: I totally lack knowledge about both EventLogging, and the incident itself. So, be prepared for double slow start if I get to work on it.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/