Hello,
Apologies for the short notice. The SRE team will be carrying out an
upgrade of the switches in eqiad row D later today
(https://phabricator.wikimedia.org/T333377) at approximately 14:00 UTC.
The network outage to this row resulting from this work is expected to
be around 30 minutes, all being well.
In support of this work, the Data Engineering team will be putting HDFS
file system into safe mode at approximately 13:30 today, which means
that write operations to the cluster will be refused.
Jobs sent to the YARN cluster will also be refused from around the same
time, so please try to plan any work that you may have for the cluster
to avoid this maintenance window.
Read-only access to Hive, Presto, Superset, Turnilo, should continue to
function normally throughout the maintenance window.
Finally, two of the stats servers (stat1005 and stat1006) will be
unavailable, so please save any work that you may have on these servers
before the loss of connectivity.
Please do reach out via any of the normal channels (email:
analytics(a)lists.wikimedia.org , IRC: #wikimedia-analytics , Slack
#data-engineering ) if you have any queries or concerns.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
The SRE team will be carrying out an upgrade of the switches in eqiad row
C: (https://phabricator.wikimedia.org/T331882) at 13:00 UTC. The network
outage to this row resulting from this work is expected to be around 30
minutes, all being well.
In support of this work, the Data Engineering team will be putting HDFS
file system into safe mode at approximately 12:30 UTC today, which means
that write operations to the cluster will be refused.
Jobs sent to the YARN cluster will also be refused from around the same
time, so please try to plan any work that you may have for the cluster to
avoid this maintenance window.
Some additional internal-facing services for analytics such as Hive,
Superset, Presto, and the Druid-analytics cluster will also be largely
unavailable for some periods while the switch upgrade takes place.
The public-facing Analytics Query Service (AQS) will continue to function,
albeit with a degraded response to some queries. However Wikistats (
stats.wikimedia.org) will be unavailable whilst the switch upgrade is in
progress.
Finally, two of the stats servers, stat1007 and stat1009, will be
unavailable, so please save any work that you may have on these servers
before the loss of connectivity.
Please reach out via any of the normal channels (email:
analytics(a)lists.wikimedia.org , IRC: #wikimedia-analytics , Slack
#data-engineering ) if you have any queries or concerns.
Kind regards,
Steve Munene
Hi everybody,
Today the SRE team will perform network maintenance in our eqiad cluster,
and Superset/Turnilo/Matomo UIs may be unavailable between 13:00 and 15:00
UTC. Please check https://phabricator.wikimedia.org/T331882 for more info.
As always if you have any questions please reach out to the Data
Engineering team :)
Luca