Hi again (for the last time hopefully :),

Hive back up and running fine. I'll try to write a summary of what happened in https://phabricator.wikimedia.org/T179943 for everybody interested. The regular Hadoop jobs were completely stopped so there was no issue with data loss/inconsistency, only a temporary unavailability of Hive.

Thanks for the patience!

Luca

2017-12-07 12:36 GMT+01:00 Luca Toscano <ltoscano@wikimedia.org>:
Hi everybody,

we are experiencing some issues with the Hive daemon, so currently Hive queries are not available. I am going to update this thread as soon as the issue is over.

For more info, please contact me (elukey) on IRC (#wikimedia-analytics).

Sorry for the trouble!

Luca 

2017-12-06 19:47 GMT+01:00 Luca Toscano <ltoscano@wikimedia.org>:
Hi everybody,

we'd need to reboot the analytics1003 host for Linux kernel and openjdk updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a (hopefully) brief amount of time, but since they'll need to stop before the reboot it might happen that in flight jobs/queries fail. We'll try to avoid the reboot if too many jobs are running, but at some point we'll need to pull the trigger.

Please let me know on IRC (#wikimedia-analytics, elukey) or via email if you have any issue with this maintenance.

Thanks and sorry for the trouble!

Luca (on behalf of the Analytics team)