Hello,
Tomorrow the SRE team will be carrying out an upgrade of the switches in
eqiad row B: (https://phabricator.wikimedia.org/T330165) at 14:00 UTC.
The network outage to this row resulting from this work is expected to
be around 30 minutes, all being well.
In support of this work, the Data Engineering team will be putting HDFS
file system into safe mode at approximately 13:30 UTC tomorrow, which
means that write operations to the cluster will be refused.
Jobs sent to the YARN cluster will also be refused from around the same
time, so please try to plan any work that you may have for the cluster
to avoid this maintenance window.
Some additional internal-facing services for analytics such as Hive,
Superset, Presto, and the Druid-analytics cluster will also be largely
unavailable for some periods while the switch upgrade takes place.
The public-facing Analytics Query Service (AQS) will continue to
function, albebeit with a degraded response to some queries. However
Wikistats (stats.wikimedia.org) will be unavailable whilst the switch
upgrade is in progress.
Finally, two of the stats servers, stat1007 and stat1009, will be
unavailable, so please save any work that you may have on these servers
before the loss of connectivity.
Please do reach out via any of the normal channels (email:
analytics(a)lists.wikimedia.org , IRC: #wikimedia-analytics , Slack
#data-engineering ) if you have any queries or concerns.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello and apologies for the short notice.
We are required to put HDFS into safe mode at approximately 13:50
UTC today, which means that the file system will be read-only.
This might be for as little as 30 minutes, but the maintenance window
we're working within is for up to 2 hours, so the actual period of
read-only access will depend on the outcome of the eqiad row A switches
upgrade (https://phabricator.wikimedia.org/T329073) by the
Infrastructure Foundations team.
We will be pausing ingestion to the Data Lake a little ahead of this
time, so there will be a delay in dataset availability on HDFS,
Cassandra, and Druid etc.
Apologies for any inconvenience that this disruption to service will
cause you.
Please do let us know by reply to this list or in #wikimedia-analytics
on IRC if you have any queries, or would like to follow-along with our
support of the maintenance work.
Kind regards,
Ben Tullis
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Sorry for the cross post, forwarding here in case people are not on ops.
please feel free to forward this message to anyone else's who you feel may
be interested
Thanks
---------- Forwarded message ---------
From: John Bond <jbond(a)wikimedia.org>
Date: Mon, Feb 6, 2023 at 3:22 PM
Subject: NO_PROXY setting for interactive shells
To: Operations Engineers <ops(a)lists.wikimedia.org>
Hi All
As part of an ongoing effort to tighten the policies on the web proxy
servers[1], I would like to deploy a change[2] that would inject a sane
no_proxy environment variable into login sessions via systemd. This means
traffic towards our own websites will not risk going through the proxy
servers when a user manually sets the http(s)_proxy environment
variable(s). It will improve resiliency, latency and load on the proxies,
while reducing user overhead (as it’s a required manual action so far [3]).
No impact is expected.
I plan to deploy this Change next Monday 13th Feb at 12:00 UTC, if you have
any concerns or questions please let me know either via email or on the
change
Thanks John
[1]https://phabricator.wikimedia.org/T300977
[2]https://gerrit.wikimedia.org/r/c/operations/puppet/+/879418
[3]https://wikitech.wikimedia.org/wiki/HTTP_proxy#How-to
Hello folks,
*What:*
We'd like to schedule a maintenance window for all the stat machines:
stat1004.eqiad.wmnet
stat1005.eqiad.wmnet
stat1006.eqiad.wmnet
stat1007.eqiad.wmnet
stat1008.eqiad.wmnet
*In this maintenance window, we will:*
* Upgrade the conda-analytics debian package. This package is how we deploy
Spark3. The upgrade will allow us to run JupyterHub and JupyterLab on top
of it.
* Deploy new puppet configurations that will switch from running Jupyter on
top of anaconda-wmf to running on top of conda-analytics.
* This effectively upgrades our Jupyter deployment as follows:
* Upgrades JupyterHub from 1.1.0 to 1.5.0.
* Upgrades JupyterLab from 3.2.9 to 3.4.8.
* Upgrades Spark on newly created conda environments from 2.4.4 to
3.1.2.
* Upgrades wmfdata on newly created conda environments to 2.0.0. This
version of wmfdata includes breaking changes.
*IMPORTANT*: After this maintenance window, you should expect the following:
*** JupyterHub and any existing JupyterLab processes including any running
kernels, will be shut down, and will have to be restarted manually by the
respective owners. ***
*** JupyterLab UI will have minor changes. ***
*** New conda environments created via JupyterHub will now be based off of
conda-analytics and will utilize Spark3. ***
*** Existing conda environments based off of anaconda-wmf will continue to
work, and continue to utilize Spark2. ***
*Why: *
As part of our effort to deprecate Spark2 and make Spark3 widely available,
we are deprecating the use of anaconda-wmf and Spark2, in favor of
conda-analytics and Spark3.
Note this is just deprecation, you will still be able to use your existing
conda environments running on top of anaconda-wmf and Spark2.
*When:*
We are proposing the following window for these changes:
Wednesday 30 Nov 2022 12:30 to 13:30 UTC.
(7:30 AM ET / 4:30 AM PT)
*More info:*
Spark3 upgrade:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migrati…
The old anaconda-wmf base conda environment:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Conda
The new conda-analytics base conda environment:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/conda-analytics
wmfdata changes:
https://github.com/wikimedia/wmfdata-python/blob/main/CHANGELOG.md
Please let us know if you have any questions or objections.
--
Xabriel J. Collazo Mojica (he/him, pronunciation
<https://commons.wikimedia.org/wiki/File:Xabriel_Collazo_Mojica_-_pronunciat…>
)
Sr Software Engineer
Wikimedia Foundation
Hello y'all!
Tomorrow or next week, we will release *version 2.0 of the Wmfdata-Python
library* for accessing data in the internal Wikimedia analytics cluster*.*
If you import Wmfdata, you will see a message asking you to update.
Wmfdata 2.0 has lots of improvements, but as a new major version, it also
has some *breaking changes*, which means your existing code may need some
changes to run properly. These changes are pretty simple, but still, don't
update if you are working to an urgent deadline!
For more information on what is changing, see the change log
<https://github.com/wikimedia/wmfdata-python/blob/main/CHANGELOG.md>.
The quickstart
notebook
<https://github.com/wikimedia/wmfdata-python/blob/main/docs/quickstart.ipynb>
has also been massively improved so it gives a comprehensive introduction
to Wmfdata's features.
As always, let me know if you have any questions!
-----
Neil Shah-Quinn
senior data scientist, Product Analytics
<https://www.mediawiki.org/wiki/Product_Analytics>
Wikimedia Foundation <https://wikimediafoundation.org/>