Team, I checked and, indeed, EventLogging database needs backfilling from 2015-11-27 01:00 until 2015-11-27 07:00. I updated the docs and started the backfilling process. I'll let you know when it it finished.
Cheers

On Fri, Nov 27, 2015 at 8:31 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
It seems like it would depend on the class of error. 48 hours for
events not syncing, fine. 48 hours of /total data loss/ is a
completely different class of problem.

On 27 November 2015 at 11:35, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>Unfortunately, the only team-members working full-time yesterday and today
>> are we Europe folks.
>>We weren't there when that happened and we don't get those alerts on the
>> phone, we should though.
> Given that this system is tier-2 i do not think we need an immediate
> response, 24 hours should be an acceptable ETA. I would say even 48.
>
> On Fri, Nov 27, 2015 at 2:31 AM, Marcel Ruiz Forns <mforns@wikimedia.org>
> wrote:
>>
>> Thanks, Ori, for having a look at this and restarting EL.
>>
>> I understand it was 01:30 UTC on Friday (today), not Thursday. It went on
>> during 5-6 hours.
>> Unfortunately, the only team-members working full-time yesterday and today
>> are we Europe folks.
>> We weren't there when that happened and we don't get those alerts on the
>> phone, we should though.
>>
>> This problem happened already like a month ago. We'll backfill the missing
>> events and will investigate.
>> Thanks again for the heads-up.
>>
>> On Fri, Nov 27, 2015 at 8:01 AM, Ori Livneh <ori@wikimedia.org> wrote:
>>>
>>> On Thu, Nov 26, 2015 at 10:46 PM, Ori Livneh <ori@wikimedia.org> wrote:
>>>>
>>>> Seems that eventlog1001 has not received any events since 01:30 UTC on
>>>> Thursday
>>>>
>>>>
>>>> http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Miscellaneous+eqiad&h=eventlog1001.eqiad.wmnet&jr=&js=&event=hide&ts=0&v=140128.28&m=bytes_in&vl=bytes%2Fsec&ti=Bytes+Received
>>>>
>>>> This is pretty severe; I'd page if it wasn't a US holiday.
>>>
>>>
>>> Kafka clients on eventlog1001 were in a "Autocommitting consumer offset"
>>> death-loop and not receiving any events from the Kafka brokers. I ran
>>> eventloggingctl stop / eventloggingctl start and they recovered. Needs to be
>>> investigated more thoroughly. Otto, can you follow up?
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>>
>> --
>> Marcel Ruiz Forns
>> Analytics Developer
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Marcel Ruiz Forns
Analytics Developer
Wikimedia Foundation