On Thu, Jun 30, 2016 at 12:17 PM, Luca Toscano <ltoscano(a)wikimedia.org>
On Wed, Jun 29, 2016 at 10:38 AM, Luca Toscano <ltoscano(a)wikimedia.org>
Tomorrow morning (Jun 30th - CET timezone) I'd need to reboot stat1002,
stat1003 and stat1004 for kernel upgrades (Ubuntu security patches). This
could potentially terminate long running queries or jobs, so please ping me
on IRC or email me if your work can't be postponed or stopped.
Starting the work now, please reach out to the wikimedia-analytics IRC
channel for any issue.
All the reboots have been performed, maintenance completed. Sadly the
analytics1003 reboot caused an outage to our Hive/Oozie/Hue database that
was running on it and it didn't restart properly due to a misconfiguration.
We are going to follow up on this incident to avoid another problem like
this one in the future.
Thanks and apologies!