Sincere apologies; I specified the time of this maintenance window incorrectly in my previous email.

The switch maintenance will be carried out by SRE at 13:00 UTC today and not 14:00 UTC as I had incorrectly stated.

Therefore, our preparatory work to make HDFS read-only and prevent new jobs from being launched on YARN, will commence at approximately 12:30 UTC.
I have corrected to text below.

Kind regards,
Ben

On 18/04/2023 11:25, Ben Tullis wrote:

Hello,

Apologies for the short notice. The SRE team will be carrying out an upgrade of the switches in eqiad row D later today (https://phabricator.wikimedia.org/T333377) at approximately 13:00 UTC.
The network outage to this row resulting from this work is expected to be around 30 minutes, all being well.

In support of this work, the Data Engineering team will be putting HDFS file system into safe mode at approximately 12:30 UTC today, which means that write operations to the cluster will be refused.
Jobs sent to the YARN cluster will also be refused from around the same time, so please try to plan any work that you may have for the cluster to avoid this maintenance window.

Read-only access to Hive, Presto, Superset, Turnilo, should continue to function normally throughout the maintenance window.

Finally, two of the stats servers (stat1005 and stat1006) will be unavailable, so please save any work that you may have on these servers before the loss of connectivity.

Please do reach out via any of the normal channels (email: analytics@lists.wikimedia.org , IRC: #wikimedia-analytics , Slack #data-engineering ) if you have any queries or concerns.

Kind regards,
Ben

Ben Tullis (he/him)
Senior Site Reliability Engineer
Wikimedia Foundation