[teampractices] [Engineering] Log ownership and deployment process

Chris McMahon cmcmahon at wikimedia.org
Wed Feb 11 16:13:16 UTC 2015


I've mentioned this briefly to Arthur, now that we have a Team Practices
Group I think we should (at least) let them know when we indentify
something needing a cultural change, especially if we're going to try to
effect such a cultural change. This is apropos of
https://phabricator.wikimedia.org/T89049

On Tue, Feb 10, 2015 at 8:27 PM, Ori Livneh <ori at wikimedia.org> wrote:

>
>
> On Tue, Feb 10, 2015 at 3:44 PM, Nuria Ruiz <nuria at wikimedia.org> wrote:
>
>> >One possibility is channeling the errors to Sentry which has
>> Phabricator integration. The ticket for doing that in beta >is
>> https://phabricator.wikimedia.org/T85239, I'm hoping to be able to work
>> on it within a few weeks.
>> Sounds real good, I think grouping errors in sentry and starting
>> assigning the biggest offenders (might not be the most dangerous but the
>> ones that pollute the log the most) via phabricator tasks will be a step
>> towards the cultural shift Antoine was talking about.
>>
>> On Tue, Feb 10, 2015 at 2:11 PM, Gergo Tisza <gtisza at wikimedia.org>
>> wrote:
>>
>>> On Tue, Feb 10, 2015 at 1:27 PM, Antoine Musso <amusso at wikimedia.org>
>>>  wrote:
>>>
>>>> A feature request for the audience: the ability in log stash to
>>>> associate a message fingerprint with a Phabricator task.  This way we
>>>> could filter out triaged messages and focus on new comers.
>>>>
>>>
>>> One possibility is channeling the errors to Sentry which has Phabricator
>>> integration. The ticket for doing that in beta is
>>> https://phabricator.wikimedia.org/T85239, I'm hoping to be able to work
>>> on it within a few weeks.
>>>
>>> Also, it would be nice if the system would take a guess at who caused
>>> the error and alert them directly. Squash for example can git blame the
>>> stack trace and find the most recent change: http://squash.io/
>>>
>>
> Ok, but the absence of these conveniences is not a blocker to getting this
> daily routine set up. Chad and Antoine know MediaWiki's logging
> infrastructure better than most.
>
> I agree with Antoine that responsibly for monitoring failures should be
> distributed, but I also note that our attempts to tackle this problem
> collectively have failed. There has to be a cultural change, yes, but a
> specific party has to own this and be accountable. If you come to feel that
> deployers are exploiting you by neglecting to monitor the changes they push
> out, as the Release Engineering team, you have ways to respond.
>
>
> _______________________________________________
> Engineering mailing list
> Engineering at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/teampractices/attachments/20150211/ffc36e0e/attachment.html>


More information about the teampractices mailing list