[Engineering] [Analytics Cluster] Downtime announcement for Oozie/Hive - Dec 7 10AM CET

Luca Toscano ltoscano at wikimedia.org
Thu Dec 7 12:35:29 UTC 2017


Hi again (for the last time hopefully :),

Hive back up and running fine. I'll try to write a summary of what happened
in https://phabricator.wikimedia.org/T179943 for everybody interested. The
regular Hadoop jobs were completely stopped so there was no issue with data
loss/inconsistency, only a temporary unavailability of Hive.

Thanks for the patience!

Luca

2017-12-07 12:36 GMT+01:00 Luca Toscano <ltoscano at wikimedia.org>:

> Hi everybody,
>
> we are experiencing some issues with the Hive daemon, so currently Hive
> queries are not available. I am going to update this thread as soon as the
> issue is over.
>
> For more info, please contact me (elukey) on IRC (#wikimedia-analytics).
>
> Sorry for the trouble!
>
> Luca
>
> 2017-12-06 19:47 GMT+01:00 Luca Toscano <ltoscano at wikimedia.org>:
>
>> Hi everybody,
>>
>> we'd need to reboot the analytics1003 host for Linux kernel and openjdk
>> updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a
>> (hopefully) brief amount of time, but since they'll need to stop before the
>> reboot it might happen that in flight jobs/queries fail. We'll try to avoid
>> the reboot if too many jobs are running, but at some point we'll need to
>> pull the trigger.
>>
>> Please let me know on IRC (#wikimedia-analytics, elukey) or via email if
>> you have any issue with this maintenance.
>>
>> Thanks and sorry for the trouble!
>>
>> Luca (on behalf of the Analytics team)
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20171207/85be2f9f/attachment.html>


More information about the Engineering mailing list