Good morning,
I have to find a convenient time to reboot a key server (an-coord1001)
which supports the analytics services.
Unfortunately, although we have a standby server, the process for
switching the two servers' roles is somewhat arduous so the pragmatic
option is to schedule a brief period of downtime for the affected
services, while the primary server is rebooted. These services are:
Hive, Superset, Oozie, Presto, and DataHub.
The outage should last for less than 10 minutes and I propose to carry
out this maintenance at 09:00 UTC tomorrow - May 5th.
Please do let me know if this is going to impact you negatively and I
will try to find another maintenance window. If you have any other
queries or concerns, please don't hesitate to get in touch.
Thanks and best wishes,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
In order to upgrade the kernels of various analytics hosts, we have to
reboot the machines, which will make several analytics clients temporarily
unavailable.
The maintenance will be Friday at 17:00-19:00 UTC (10am-12pm Pacific).
While this is happening, the following hosts and services will be
temporarily unavailable for a few minutes at a time:
- stat machines (stat1004.eqiad.wmnet etc)
- superset
- turnilo
- hadoop UI, druid UI
If you are planning a long-running query that will overlap with that time,
let me know as soon as possible and we'll find a resolution. Respond to
this email or come visit us on IRC at #wikimedia-analytics on libera.chat.
Regards,
Razzi