Hi everybody,
we'd need to reboot the analytics1003 host for Linux kernel and openjdk updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a (hopefully) brief amount of time, but since they'll need to stop before the reboot it might happen that in flight jobs/queries fail. We'll try to avoid the reboot if too many jobs are running, but at some point we'll need to pull the trigger.
Please let me know on IRC (#wikimedia-analytics, elukey) or via email if you have any issue with this maintenance.
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)
Hi everybody,
we are experiencing some issues with the Hive daemon, so currently Hive queries are not available. I am going to update this thread as soon as the issue is over.
For more info, please contact me (elukey) on IRC (#wikimedia-analytics).
Sorry for the trouble!
Luca
2017-12-06 19:47 GMT+01:00 Luca Toscano ltoscano@wikimedia.org:
Hi everybody,
we'd need to reboot the analytics1003 host for Linux kernel and openjdk updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a (hopefully) brief amount of time, but since they'll need to stop before the reboot it might happen that in flight jobs/queries fail. We'll try to avoid the reboot if too many jobs are running, but at some point we'll need to pull the trigger.
Please let me know on IRC (#wikimedia-analytics, elukey) or via email if you have any issue with this maintenance.
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)
Hi Luca,
well, given that you are already have to deal with Hive today, just to report back that I have had a few situations with the HS2 server rejecting my queries in the previous days, reporting back that the most likely reason is the number of open connections. I guess some defensive programming in my R scripts will take care of running the queries when the rush is not that dense, however, nothing similar has ever happened in the previous months, so I wanted to report back.
You know that I'm not in Data Engineering so I don't have a clue whether this has or does not have to do with the HS2 settings as they were planned by Analytics-Engineering. Maybe nothing needs to be changed. Just wanted to let you know.
Good luck with the daemon.
Best,
Goran S. Milovanović, PhD Data Scientist, Software Department Wikimedia Deutschland
------------------------------------------------ "It's not the size of the dog in the fight, it's the size of the fight in the dog." - Mark Twain ------------------------------------------------
On Thu, Dec 7, 2017 at 12:36 PM, Luca Toscano ltoscano@wikimedia.org wrote:
Hi everybody,
we are experiencing some issues with the Hive daemon, so currently Hive queries are not available. I am going to update this thread as soon as the issue is over.
For more info, please contact me (elukey) on IRC (#wikimedia-analytics).
Sorry for the trouble!
Luca
2017-12-06 19:47 GMT+01:00 Luca Toscano ltoscano@wikimedia.org:
Hi everybody,
we'd need to reboot the analytics1003 host for Linux kernel and openjdk updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a (hopefully) brief amount of time, but since they'll need to stop before the reboot it might happen that in flight jobs/queries fail. We'll try to avoid the reboot if too many jobs are running, but at some point we'll need to pull the trigger.
Please let me know on IRC (#wikimedia-analytics, elukey) or via email if you have any issue with this maintenance.
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)
Engineering mailing list Engineering@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering
Hi again (for the last time hopefully :),
Hive back up and running fine. I'll try to write a summary of what happened in https://phabricator.wikimedia.org/T179943 for everybody interested. The regular Hadoop jobs were completely stopped so there was no issue with data loss/inconsistency, only a temporary unavailability of Hive.
Thanks for the patience!
Luca
2017-12-07 12:36 GMT+01:00 Luca Toscano ltoscano@wikimedia.org:
Hi everybody,
we are experiencing some issues with the Hive daemon, so currently Hive queries are not available. I am going to update this thread as soon as the issue is over.
For more info, please contact me (elukey) on IRC (#wikimedia-analytics).
Sorry for the trouble!
Luca
2017-12-06 19:47 GMT+01:00 Luca Toscano ltoscano@wikimedia.org:
Hi everybody,
we'd need to reboot the analytics1003 host for Linux kernel and openjdk updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a (hopefully) brief amount of time, but since they'll need to stop before the reboot it might happen that in flight jobs/queries fail. We'll try to avoid the reboot if too many jobs are running, but at some point we'll need to pull the trigger.
Please let me know on IRC (#wikimedia-analytics, elukey) or via email if you have any issue with this maintenance.
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)