I have to find a convenient time to reboot a key server (an-coord1001)
which supports the analytics services.
Unfortunately, although we have a standby server, the process for
switching the two servers' roles is somewhat arduous so the pragmatic
option is to schedule a brief period of downtime for the affected
services, while the primary server is rebooted. These services are:
Hive, Superset, Oozie, Presto, and DataHub.
The outage should last for less than 10 minutes and I propose to carry
out this maintenance at 09:00 UTC tomorrow - May 5th.
Please do let me know if this is going to impact you negatively and I
will try to find another maintenance window. If you have any other
queries or concerns, please don't hesitate to get in touch.
Thanks and best wishes,
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
In order to upgrade the kernels of various analytics hosts, we have to
reboot the machines, which will make several analytics clients temporarily
The maintenance will be Friday at 17:00-19:00 UTC (10am-12pm Pacific).
While this is happening, the following hosts and services will be
temporarily unavailable for a few minutes at a time:
- stat machines (stat1004.eqiad.wmnet etc)
- hadoop UI, druid UI
If you are planning a long-running query that will overlap with that time,
let me know as soon as possible and we'll find a resolution. Respond to
this email or come visit us on IRC at #wikimedia-analytics on libera.chat.