On Thu, Jun 30, 2016 at 12:17 PM, Luca Toscano ltoscano@wikimedia.org wrote:
On Wed, Jun 29, 2016 at 10:38 AM, Luca Toscano ltoscano@wikimedia.org wrote:
Hi!
Tomorrow morning (Jun 30th - CET timezone) I'd need to reboot stat1002, stat1003 and stat1004 for kernel upgrades (Ubuntu security patches). This could potentially terminate long running queries or jobs, so please ping me on IRC or email me if your work can't be postponed or stopped.
Starting the work now, please reach out to the wikimedia-analytics IRC channel for any issue.
All the reboots have been performed, maintenance completed. Sadly the analytics1003 reboot caused an outage to our Hive/Oozie/Hue database that was running on it and it didn't restart properly due to a misconfiguration. We are going to follow up on this incident to avoid another problem like this one in the future.
Thanks and apologies!
Luca