Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
Hi Andrew & AE,
Thanks! Python kernel seems to be working well but the R kernel keeps dying.
Looking forward to trying out pyhive!
- Mikhail
On Thu, Mar 22, 2018 at 12:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
On Thu, Mar 22, 2018 at 12:34 PM, Andrew Otto otto@wikimedia.org wrote:
But there is good news too! Last week I rsynced everyone’s home
directories
from notebook1001 over to notebook1003.
Thanks for doing this.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages by default. pandas, scipy, requests, etc.
❤ this.
And thanks for giving notebooks some love. They are key in reproducibility of what we do here. Within Research, we will pick up the conversation about how to do more effective documentation via notebooks soon as they are much closer to our workflows and allow us to do more useful documentations.
Leila
Oh, I forgot one thing! JupyterLab is now available too! It isn’t (yet) the default, but if you are able, try it out instead of regular old Jupyter. To do so, navigate to http://localhost:8000/user/<username>/lab
On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
I have just done a final rsync of home directories from notebook1001 over to notebook1003.
*Do not use notebook1001 anymore.*
I will leave notebook1001 only for another day in case there or issues, but plan to start the decom process this week.
Thanks all!
On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
Thanks Andrew! I double-checked my folder and verified that all notebook files were copied over correctly.
However, it is also worth mentioning that the kernel state is not preserved after the transfer, i.e. all running notebooks are stopped. This means for example that any variable values (say query results that are stored in a dataframe) will need to be recalculated or restored from e.g. a CSV or pickle file. It's good practice to save important data in that form anyway (notebooks can stop running for other reasons too, although they have usually stayed live for many days or weeks). Still, I can see an argument for holding off the decommissioning just a little longer, until say early next week, if that doesn't disrupt other things.
In any case, +1 to what Leila said - I really appreciate the technical support for SWAP and am excited about the additional possibilities that this upgrade is bringing.
On Mon, Apr 2, 2018 at 7:29 AM, Andrew Otto otto@wikimedia.org wrote:
I have just done a final rsync of home directories from notebook1001 over to notebook1003.
*Do not use notebook1001 anymore.*
I will leave notebook1001 only for another day in case there or issues, but plan to start the decom process this week.
Thanks all!
On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
Thanks Andrew and AE!!! ❤
JupyterLab works like a charm! And I installed the R kernel under my account, it works well too! 🎉
Just one issue, I don't seem to be able to access Spark via notebook1003. I can open a pyspark shell in the terminal on JupyterLab, but `SHOW DATABASES` returns 'default', instead of a list of databases on hadoop.
Chelsy
On Mon, Apr 2, 2018 at 8:31 PM, Tilman Bayer tbayer@wikimedia.org wrote:
Thanks Andrew! I double-checked my folder and verified that all notebook files were copied over correctly.
However, it is also worth mentioning that the kernel state is not preserved after the transfer, i.e. all running notebooks are stopped. This means for example that any variable values (say query results that are stored in a dataframe) will need to be recalculated or restored from e.g. a CSV or pickle file. It's good practice to save important data in that form anyway (notebooks can stop running for other reasons too, although they have usually stayed live for many days or weeks). Still, I can see an argument for holding off the decommissioning just a little longer, until say early next week, if that doesn't disrupt other things.
In any case, +1 to what Leila said - I really appreciate the technical support for SWAP and am excited about the additional possibilities that this upgrade is bringing.
On Mon, Apr 2, 2018 at 7:29 AM, Andrew Otto otto@wikimedia.org wrote:
I have just done a final rsync of home directories from notebook1001 over to notebook1003.
*Do not use notebook1001 anymore.*
I will leave notebook1001 only for another day in case there or issues, but plan to start the decom process this week.
Thanks all!
On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Tilman, aye ok thanks. I will delay decom til next week.
Chelsy yeah! There should have been a hive-site.xml symlink created for spark2 when it was installed. I’ll have to look into why that didn’t happen next time I poke around in the spark2 debian package, probably when we upgrade to spark 2.3. In the meantime, I’ve manually created the symlink, so you should be able to SHOW DATABASES and see all your favorite friends now.
On Tue, Apr 3, 2018 at 1:13 PM, Chelsy Xie cxie@wikimedia.org wrote:
Thanks Andrew and AE!!! ❤
JupyterLab works like a charm! And I installed the R kernel under my account, it works well too! 🎉
Just one issue, I don't seem to be able to access Spark via notebook1003. I can open a pyspark shell in the terminal on JupyterLab, but `SHOW DATABASES` returns 'default', instead of a list of databases on hadoop.
Chelsy
On Mon, Apr 2, 2018 at 8:31 PM, Tilman Bayer tbayer@wikimedia.org wrote:
Thanks Andrew! I double-checked my folder and verified that all notebook files were copied over correctly.
However, it is also worth mentioning that the kernel state is not preserved after the transfer, i.e. all running notebooks are stopped. This means for example that any variable values (say query results that are stored in a dataframe) will need to be recalculated or restored from e.g. a CSV or pickle file. It's good practice to save important data in that form anyway (notebooks can stop running for other reasons too, although they have usually stayed live for many days or weeks). Still, I can see an argument for holding off the decommissioning just a little longer, until say early next week, if that doesn't disrupt other things.
In any case, +1 to what Leila said - I really appreciate the technical support for SWAP and am excited about the additional possibilities that this upgrade is bringing.
On Mon, Apr 2, 2018 at 7:29 AM, Andrew Otto otto@wikimedia.org wrote:
I have just done a final rsync of home directories from notebook1001 over to notebook1003.
*Do not use notebook1001 anymore.*
I will leave notebook1001 only for another day in case there or issues, but plan to start the decom process this week.
Thanks all!
On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
-- *Chelsy Xie* Data Analyst Wikimedia Foundation
Thank you Andrew!
On Thu, Apr 5, 2018 at 7:17 AM, Andrew Otto otto@wikimedia.org wrote:
Tilman, aye ok thanks. I will delay decom til next week.
Chelsy yeah! There should have been a hive-site.xml symlink created for spark2 when it was installed. I’ll have to look into why that didn’t happen next time I poke around in the spark2 debian package, probably when we upgrade to spark 2.3. In the meantime, I’ve manually created the symlink, so you should be able to SHOW DATABASES and see all your favorite friends now.
On Tue, Apr 3, 2018 at 1:13 PM, Chelsy Xie cxie@wikimedia.org wrote:
Thanks Andrew and AE!!! ❤
JupyterLab works like a charm! And I installed the R kernel under my account, it works well too! 🎉
Just one issue, I don't seem to be able to access Spark via notebook1003. I can open a pyspark shell in the terminal on JupyterLab, but `SHOW DATABASES` returns 'default', instead of a list of databases on hadoop.
Chelsy
On Mon, Apr 2, 2018 at 8:31 PM, Tilman Bayer tbayer@wikimedia.org wrote:
Thanks Andrew! I double-checked my folder and verified that all notebook files were copied over correctly.
However, it is also worth mentioning that the kernel state is not preserved after the transfer, i.e. all running notebooks are stopped. This means for example that any variable values (say query results that are stored in a dataframe) will need to be recalculated or restored from e.g. a CSV or pickle file. It's good practice to save important data in that form anyway (notebooks can stop running for other reasons too, although they have usually stayed live for many days or weeks). Still, I can see an argument for holding off the decommissioning just a little longer, until say early next week, if that doesn't disrupt other things.
In any case, +1 to what Leila said - I really appreciate the technical support for SWAP and am excited about the additional possibilities that this upgrade is bringing.
On Mon, Apr 2, 2018 at 7:29 AM, Andrew Otto otto@wikimedia.org wrote:
I have just done a final rsync of home directories from notebook1001 over to notebook1003.
*Do not use notebook1001 anymore.*
I will leave notebook1001 only for another day in case there or issues, but plan to start the decom process this week.
Thanks all!
On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
-- *Chelsy Xie* Data Analyst Wikimedia Foundation
FYI, I have started the decomission of notebook1001, it will no longer be accessible.
On Thu, Apr 5, 2018 at 2:24 PM, Chelsy Xie cxie@wikimedia.org wrote:
Thank you Andrew!
On Thu, Apr 5, 2018 at 7:17 AM, Andrew Otto otto@wikimedia.org wrote:
Tilman, aye ok thanks. I will delay decom til next week.
Chelsy yeah! There should have been a hive-site.xml symlink created for spark2 when it was installed. I’ll have to look into why that didn’t happen next time I poke around in the spark2 debian package, probably when we upgrade to spark 2.3. In the meantime, I’ve manually created the symlink, so you should be able to SHOW DATABASES and see all your favorite friends now.
On Tue, Apr 3, 2018 at 1:13 PM, Chelsy Xie cxie@wikimedia.org wrote:
Thanks Andrew and AE!!! ❤
JupyterLab works like a charm! And I installed the R kernel under my account, it works well too! 🎉
Just one issue, I don't seem to be able to access Spark via notebook1003. I can open a pyspark shell in the terminal on JupyterLab, but `SHOW DATABASES` returns 'default', instead of a list of databases on hadoop.
Chelsy
On Mon, Apr 2, 2018 at 8:31 PM, Tilman Bayer tbayer@wikimedia.org wrote:
Thanks Andrew! I double-checked my folder and verified that all notebook files were copied over correctly.
However, it is also worth mentioning that the kernel state is not preserved after the transfer, i.e. all running notebooks are stopped. This means for example that any variable values (say query results that are stored in a dataframe) will need to be recalculated or restored from e.g. a CSV or pickle file. It's good practice to save important data in that form anyway (notebooks can stop running for other reasons too, although they have usually stayed live for many days or weeks). Still, I can see an argument for holding off the decommissioning just a little longer, until say early next week, if that doesn't disrupt other things.
In any case, +1 to what Leila said - I really appreciate the technical support for SWAP and am excited about the additional possibilities that this upgrade is bringing.
On Mon, Apr 2, 2018 at 7:29 AM, Andrew Otto otto@wikimedia.org wrote:
I have just done a final rsync of home directories from notebook1001 over to notebook1003.
*Do not use notebook1001 anymore.*
I will leave notebook1001 only for another day in case there or issues, but plan to start the decom process this week.
Thanks all!
On Thu, Mar 22, 2018 at 3:34 PM, Andrew Otto otto@wikimedia.org wrote:
Hi everyone!
*tl;dr stop using notebook1001 by Monday April 2nd, use notebook1003 instead.*
*(If you don’t have production access, you can ignore this email.)*
As part of https://phabricator.wikimedia.org/T183145, we’ve ordered new hardware to replace the aging notebook1001. The new servers are ready to go, so we need to schedule a deprecation timeline for notebook1001. That timeline is Monday April 2nd. After that, your work on notebook1001 will not longer be accessible. Instead you should use notebook1003 (or notebook1004).
But there is good news too! Last week I rsynced everyone’s home directories from notebook1001 over to notebook1003. I also upgraded the default virtualenv your notebooks run from. Your notebook files should all be accessible on notebook1003. However, the version of Python3 changed from 3.4 to 3.5 during this upgrade. Dependencies that your notebook uses that you installed on notebook1001 may not be available at first. You might need to redo a pip install those dependencies into the new notebook Python 3.5 virtualenv. (I can’t really give you explicit instructions to do that, as I don’t know what you use for your notebooks.)
I’ll do a final rsync any newer files in home directories from notebook1001 on Monday April 2nd. If you’ve been working on notebook1001 since after March 15th, this should get everything up to date on notebook1003 before notebook1001 goes away. BUT! *Do not work on both notebook1001 and notebook1003*! My final rsync will keep the most recently modified version of files from either server.
OOooOo and there’s even more good news! I’ve made the notebooks able to access system site packages, and installed a ton of useful packages https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/packages.pp#L77-L98 by default. pandas, scipy, requests, etc. If there’s something else you think you might need, let us know. Or just pip install it into your notebook.
Additionally, pyhive has been installed too, so you should be able to more easily access Hive directly from a python notebook.
I’ve updated docs at https://wikitech.wikimedia.org/wiki/SWAP#Usage, please take a look.
If you have any questions, please don’t hesitate to ask, either here on or phabricator: https://phabricator.wikimedia.org/T183145.
- Andrew Otto & Analytics Engineering
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
-- *Chelsy Xie* Data Analyst Wikimedia Foundation
-- *Chelsy Xie* Data Analyst Wikimedia Foundation