Hello!
tl;dr: We'd like to turn off Jupyter+Virtualenv (SWAP) in favor of
Jupyter+Conda (Newpyter) the week of May 3rd. Please help us test and
switch before then.
Over the last year, we've slowly been working on a replacement of the
current virtualenv based JupyterHub system (formerly known as SWAP) with a
new one based on Conda <https://docs.conda.io/en/latest/> (AKA Newpyter).
Everything should be in place to switch and decommission the virtualenv
based system you all are used to. Before we do...we have to make sure you
all use and are ok with the new setup!
We'd like to decommission Jupyter+Virtualenv (running on port 8000) the
week of May 3rd. In the meantime, please switch to Jupyter+Conda on port
8880. The documentation has been updated.
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter>
Summary of the changes:
- You will ssh tunnel to port 8880
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Access>
instead of port 8000.
- Your Notebook files will remain unchanged.
- Your local data files will remain unchanged.
- Your Python environment will change, so you may need to re-install
packages. See docs here
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Conda_environ…>
and here
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda#Installing_p…>
.
- PySpark, Scala Spark and Spark SQL and Spark-R kernels will be
removed. If you use the PySpark kernels currently, please port them to a
regular Python kernel using wmfdata-python to launch your SparkSession.
Docs here
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#PySpark>.
Please reach out with any questions, and report issues on this ticket
<https://phabricator.wikimedia.org/T224658>. If we encounter any blockers
along the way, we will postpone the May 3rd deadline.
Thank you!
- Andrew Otto + Data Engineering
Hello!
We'd like to stop refining the mediawiki.job.* queue streams into the Hive
event database. This means that the event.mediawiki_job_* tables will be
removed.
I don't expect that anyone actually uses them, but if there are objections,
please let us know at https://phabricator.wikimedia.org/T281605.
If no objections, I will stop refining and remove these the week of May 10.
Hi everybody,
We are going to change the Yarn scheduler in a bit, moving from Fair to
Capacity. More info in
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop#Queues
It shouldn't impact ongoing jobs, but there will probably be some
tuning/adjustment to do during the next few days. If you see any weird
behavior please ping us or contact us in T277062.
A gift for your patience: this change will allow users to select hadoop
nodes with GPUs when launching jobs :) (we have 6, running the same GPUs as
stat100[5,8]).
Luca
Hi everybody,
Tomorrow EU morning (9:00 CET) I'll upgrade the Debian OS on an-coord1001,
and several services will need to be stopped: Hive, Oozie, Presto (may also
affect Druid, so Turnilo and Superset). The total downtime should be around
one hour (best case scenario).
Please let me know if this impacts your in
https://phabricator.wikimedia.org/T278424 (in case we can find another time
window for the downtime).
Thanks!
Luca (on behalf of the Analytics / Data Engineering team)
Hi everybody,
we are going to make hue-next.wikimedia.org (freshly updated to 4.9, latest
upstream) the new hue.wikimedia.org. This will mean that:
1) We'll finally remove the last dependency for the Cloudera CDH packages
that we still have on our APT repositories (not good after
https://www.cloudera.com/downloads/paywall-expansion.html). The
hue.wikimedia.org's backend is still running a very old version of Hue with
Python 2..
2) We'll move Hue's user management to be fully automated via CAS, so no
more requests to the Analytics team to create a new user etc.. Every new
user should be able to use Hue straight away without any extra ping (beside
the usual ones for wmf/nda LDAP membership of course).
The main downside is that there are some bugs (github issues already opened
to upstream) that may cause some bad UX experience, see
https://phabricator.wikimedia.org/T264896. Due to 1) we'll need to proceed
asap, but if anybody is interested in following up on those github issues
feel free to :)
If nobody opposes I'll make the switch tomorrow Apr 15th during the EU
morning (there will be a little downtime but hopefully limited to one hour).
For any questions or follow up, please ping me on IRC or add a note in the
aforementioned task!
Luca
[also sent to ops@list earlier]
Hi all,
The current bastion host for eqiad (bast1002.wikimedia.org) is five
years old and being replaced by a new server (bast1003.wikimedia.org).
Please adapt your SSH client configs, I've also updated wmf-laptop-sre
(0.5.1) to use the new server.
bast1002.wikimedia.org will stick around for another week.
Cheers,
Moritz
Hi all,
After much ado, https://superset.wikimedia.org is running Superset 1.0!
Read more about the changes at the release announcement
<https://www.preset.io/blog/2021-01-18-superset-1-0/>.
If you find any regressions, comment on the phab task here:
https://phabricator.wikimedia.org/T272390 or ping razzi on irc
#wikimedia-analytics
Kudos to the Superset contributors who actually built the app, and to
Andrew and Luca for helping me get it released!
Regards,
Razzi & Analytics / Data Engineering