Heads up in case you query Event Logging tables.
---------- Forwarded message ----------
From: *Marcel Ruiz Forns* <mforns(a)wikimedia.org>
Date: Monday, November 30, 2015
Subject: [Analytics] EventLogging outage in progress?
To: "A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics." <analytics(a)lists.wikimedia.org>
Team, I checked and, indeed, EventLogging database needs backfilling from
2015-11-27 01:00 until 2015-11-27 07:00. I updated the docs and started the
backfilling process. I'll let you know when it it finished.
Cheers
On Fri, Nov 27, 2015 at 8:31 PM, Oliver Keyes <okeyes(a)wikimedia.org
<javascript:_e(%7B%7D,'cvml','okeyes@wikimedia.org');>> wrote:
It seems like it would depend on the class of error.
48 hours for
events not syncing, fine. 48 hours of /total data loss/ is a
completely different class of problem.
On 27 November 2015 at 11:35, Nuria Ruiz <nuria(a)wikimedia.org
<javascript:_e(%7B%7D,'cvml','nuria@wikimedia.org');>> wrote:
>Unfortunately, the only team-members working
full-time yesterday and
today
are we
Europe folks.
We weren't there when that happened and we don't get those alerts on the
phone, we should though.
Given that this system is tier-2 i do not think we need
an immediate
response, 24 hours should be an acceptable ETA. I would say even 48.
On Fri, Nov 27, 2015 at 2:31 AM, Marcel Ruiz Forns <mforns(a)wikimedia.org
<javascript:_e(%7B%7D,'cvml','mforns@wikimedia.org');>>
wrote:
>
> Thanks, Ori, for having a look at this and restarting EL.
>
> I understand it was 01:30 UTC on Friday (today), not Thursday. It went
on
> during 5-6 hours.
> Unfortunately, the only team-members working full-time yesterday and
today
> are we Europe folks.
> We weren't there when that happened and we don't get those alerts on the
> phone, we should though.
>
> This problem happened already like a month ago. We'll backfill the
missing
> events and will investigate.
> Thanks again for the heads-up.
>
> On Fri, Nov 27, 2015 at 8:01 AM, Ori Livneh <ori(a)wikimedia.org
<javascript:_e(%7B%7D,'cvml','ori@wikimedia.org');>> wrote:
>>
>> On Thu, Nov 26, 2015 at 10:46 PM, Ori Livneh <ori(a)wikimedia.org
<javascript:_e(%7B%7D,'cvml','ori@wikimedia.org');>> wrote:
>>>
>>> Seems that eventlog1001 has not received any events since 01:30 UTC on
>>> Thursday
>>>
>>>
>>>
http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Misc…
>>>
>>> This is pretty severe; I'd page if it wasn't a US holiday.
>>
>>
>> Kafka clients on eventlog1001 were in a "Autocommitting consumer
offset"
>> death-loop and not receiving any events
from the Kafka brokers. I ran
>> eventloggingctl stop / eventloggingctl start and they recovered. Needs
to
be
>> investigated more thoroughly. Otto, can
you follow up?
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
<javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org');>
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Marcel Ruiz Forns
> Analytics Developer
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
<javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org');>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
<javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org');>
--
Oliver Keyes
Count Logula
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
<javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org');>
https://lists.wikimedia.org/mailman/listinfo/analytics