Hello folks,
*What:*
We'd like to schedule a maintenance window for all the stat machines:
stat1004.eqiad.wmnet
stat1005.eqiad.wmnet
stat1006.eqiad.wmnet
stat1007.eqiad.wmnet
stat1008.eqiad.wmnet
*In this maintenance window, we will:*
* Upgrade the conda-analytics debian package. This package is how we deploy
Spark3. The upgrade will allow us to run JupyterHub and JupyterLab on top
of it.
* Deploy new puppet configurations that will switch from running Jupyter on
top of anaconda-wmf to running on top of conda-analytics.
* This effectively upgrades our Jupyter deployment as follows:
* Upgrades JupyterHub from 1.1.0 to 1.5.0.
* Upgrades JupyterLab from 3.2.9 to 3.4.8.
* Upgrades Spark on newly created conda environments from 2.4.4 to
3.1.2.
* Upgrades wmfdata on newly created conda environments to 2.0.0. This
version of wmfdata includes breaking changes.
*IMPORTANT*: After this maintenance window, you should expect the following:
*** JupyterHub and any existing JupyterLab processes including any running
kernels, will be shut down, and will have to be restarted manually by the
respective owners. ***
*** JupyterLab UI will have minor changes. ***
*** New conda environments created via JupyterHub will now be based off of
conda-analytics and will utilize Spark3. ***
*** Existing conda environments based off of anaconda-wmf will continue to
work, and continue to utilize Spark2. ***
*Why: *
As part of our effort to deprecate Spark2 and make Spark3 widely available,
we are deprecating the use of anaconda-wmf and Spark2, in favor of
conda-analytics and Spark3.
Note this is just deprecation, you will still be able to use your existing
conda environments running on top of anaconda-wmf and Spark2.
*When:*
We are proposing the following window for these changes:
Wednesday 30 Nov 2022 12:30 to 13:30 UTC.
(7:30 AM ET / 4:30 AM PT)
*More info:*
Spark3 upgrade:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migrati…
The old anaconda-wmf base conda environment:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Conda
The new conda-analytics base conda environment:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/conda-analytics
wmfdata changes:
https://github.com/wikimedia/wmfdata-python/blob/main/CHANGELOG.md
Please let us know if you have any questions or objections.
--
Xabriel J. Collazo Mojica (he/him, pronunciation
<https://commons.wikimedia.org/wiki/File:Xabriel_Collazo_Mojica_-_pronunciat…>
)
Sr Software Engineer
Wikimedia Foundation
Hello y'all!
Tomorrow or next week, we will release *version 2.0 of the Wmfdata-Python
library* for accessing data in the internal Wikimedia analytics cluster*.*
If you import Wmfdata, you will see a message asking you to update.
Wmfdata 2.0 has lots of improvements, but as a new major version, it also
has some *breaking changes*, which means your existing code may need some
changes to run properly. These changes are pretty simple, but still, don't
update if you are working to an urgent deadline!
For more information on what is changing, see the change log
<https://github.com/wikimedia/wmfdata-python/blob/main/CHANGELOG.md>.
The quickstart
notebook
<https://github.com/wikimedia/wmfdata-python/blob/main/docs/quickstart.ipynb>
has also been massively improved so it gives a comprehensive introduction
to Wmfdata's features.
As always, let me know if you have any questions!
-----
Neil Shah-Quinn
senior data scientist, Product Analytics
<https://www.mediawiki.org/wiki/Product_Analytics>
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
Looking for a time to reboot two of our analytics explorer (stat) servers
for a kernel upgrade.
These are stat1005.eqiad.wmnet and stat1008.eqiad.wmnet.
I would like to handle the reboots on *Tuesday 15th November 2022 between
06:00 UTC and 06:30 UTC*
Kindly let me know if a maintenance window within these times would cause
an inconvenience then I can push back the reboots to accomodate your needs.
--
Best,
Steve Munene
Hi Everyone,
The Data Engineering team is upgrading to Spark 3 and will no longer be
supporting Spark 2 jobs on the Hadoop cluster after March 31st, 2023. If
your team owns Spark 2 jobs in production, please plan for the time needed
to upgrade your jobs. For all future work use Spark 3.
You can find more information about the upgrade on:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migrati….
Please add any missing jobs to the migration list on that page. If you need
help from the data engineering team you can reach out to Jackeline Argüello
<jarguello-ctr(a)wikimedia.org> or join us for the data engineering office
hours.
--
*Olja Dimitrijevic* (she/her)
Director of Data Engineering
Wikimedia Foundation <https://wikimediafoundation.org/>