Thank you Andrew!

On Thu, Apr 5, 2018 at 7:17 AM, Andrew Otto <> wrote:
Tilman, aye ok thanks.  I will delay decom til next week.

Chelsy yeah!  There should have been a hive-site.xml symlink created for spark2 when it was installed.  I’ll have to look into why that didn’t happen next time I poke around in the spark2 debian package, probably when we upgrade to spark 2.3.  In the meantime, I’ve manually created the symlink, so you should be able to SHOW DATABASES and see all your favorite friends now.

On Tue, Apr 3, 2018 at 1:13 PM, Chelsy Xie <> wrote:
Thanks Andrew and AE!!! ❤

JupyterLab works like a charm! And I installed the R kernel under my account, it works well too! 🎉

Just one issue, I don't seem to be able to access Spark via notebook1003. I can open a pyspark shell in the terminal on JupyterLab, but `SHOW DATABASES` returns 'default', instead of a list of databases on hadoop.


On Mon, Apr 2, 2018 at 8:31 PM, Tilman Bayer <> wrote:
Thanks Andrew! I double-checked my folder and verified that all notebook files were copied over correctly.

However, it is also worth mentioning that the kernel state is not preserved after the transfer, i.e. all running notebooks are stopped. This means for example that any variable values (say query results that are stored in a dataframe) will need to be recalculated or restored from e.g. a CSV or pickle file. It's good practice to save important data in that form anyway (notebooks can stop running for other reasons too, although they have usually stayed live for many days or weeks). Still, I can see an argument for holding off the decommissioning just a little longer, until say early next week, if that doesn't disrupt other things.

In any case, +1 to what Leila said - I really appreciate the technical support for SWAP and am excited about the additional possibilities that this upgrade is bringing.

On Mon, Apr 2, 2018 at 7:29 AM, Andrew Otto <> wrote:
I have just done a final rsync of home directories from notebook1001 over to notebook1003.  

Do not use notebook1001 anymore.

I will leave notebook1001 only for another day in case there or issues, but plan to start the decom process this week.

Thanks all!

On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto <> wrote:
Hi everyone!

tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.

(If you don’t have production access, you can ignore this email.)

As part of, we’ve ordered new hardware to replace the aging notebook1001.  The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001.  That timeline is Monday April 2nd.  After that, your work on notebook1001 will not longer be accessible.  Instead you should use notebook1003 (or notebook1004).

But there is good news too!  Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003.  I also upgraded the default virtualenv your notebooks run from.  Your notebook files should all be accessible on notebook1003.  However, the version of Python3 changed from 3.4 to 3.5 during this upgrade.  Dependencies that your notebook uses that you installed on notebook1001 may not be available at first.  You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv.  (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)

I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd.  If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away.  BUT!  Do not work on both notebook1001 and notebook1003!  My final rsync will keep the most recently modified version of files from either server.

OOooOo and there’s even more good news!  I’ve made the notebooks able to access system site packages, and installed a ton of useful packages by default.  pandas, scipy, requests, etc.  If there’s something else you think you might need, let us know.  Or just pip install it into your notebook.

Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.  

I’ve updated docs at, please take a look.

If you have any questions, please don’t hesitate to ask, either here on or phabricator:

- Andrew Otto & Analytics Engineering

Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

Chelsy Xie
Data Analyst
Wikimedia Foundation

Chelsy Xie
Data Analyst
Wikimedia Foundation