TL;DR: In https://phabricator.wikimedia.org/T170826 the Analytics team wants to add base firewall rules to stat100x and notebook100x hosts, that will cause any non-localhost or known traffic to be blocked by default. Please let us know in the task if this is a problem for you.
Hi everybody,
the Analytics team has always left the stat100x and notebook100x hosts without a set of base firewall rules to avoid impacting any research/test/etc.. activity on those hosts. This choice has a lot of downsides, one of the most problematic ones is that usually environments like the Python venvs can install potentially any package, and if the owner does not pay attention to security upgrades then we may have a security problem if the environment happens to bind to a network port and accept traffic from anywhere.
One of the biggest problems was Spark: when somebody launches a shell using Hadoop Yarn (--master yarn), a Driver component is created that needs to bind to a random port to be able to communicate with the workers created on the Hadoop cluster. We assumed that instructing Spark to use a predefined range of random ports was not possible, but in https://phabricator.wikimedia.org/T170826 we discovered that there is a way (that seems to work fine from our tests). The other big use case that we know, Jupyter notebooks, seems to require only localhost traffic flow without restrictions.
Please let us know in the task if you have a use case that requires your environment to bind to a network port on stat100x or notebook100x and accept traffic from other hosts. For example, having a python app that binds to port 33000 on stat1007 and listens/accepts traffic from other stat or notebook hosts.
If we don't hear anything, we'll start adding base firewall rules to one host at the time during the upcoming weeks, tracking our work on the aforementioned task.
Thanks!
Luca (on behalf of the Analytics team)
Hey Luca, We discussed this in Research and it all sounds good to us with one question below. If something else arises, we'll ping you. Thanks for the heads up!
We assumed that instructing Spark to use a predefined
range of random ports was not possible, but in https://phabricator.wikimedia.org/T170826 we discovered that there is a way (that seems to work fine from our tests).
Will we need to change anything in our configuration or will this be automatic?
Best, Isaac
On Fri, Jul 5, 2019 at 4:36 AM Luca Toscano ltoscano@wikimedia.org wrote:
TL;DR: In https://phabricator.wikimedia.org/T170826 the Analytics team wants to add base firewall rules to stat100x and notebook100x hosts, that will cause any non-localhost or known traffic to be blocked by default. Please let us know in the task if this is a problem for you.
Hi everybody,
the Analytics team has always left the stat100x and notebook100x hosts without a set of base firewall rules to avoid impacting any research/test/etc.. activity on those hosts. This choice has a lot of downsides, one of the most problematic ones is that usually environments like the Python venvs can install potentially any package, and if the owner does not pay attention to security upgrades then we may have a security problem if the environment happens to bind to a network port and accept traffic from anywhere.
One of the biggest problems was Spark: when somebody launches a shell using Hadoop Yarn (--master yarn), a Driver component is created that needs to bind to a random port to be able to communicate with the workers created on the Hadoop cluster. We assumed that instructing Spark to use a predefined range of random ports was not possible, but in https://phabricator.wikimedia.org/T170826 we discovered that there is a way (that seems to work fine from our tests). The other big use case that we know, Jupyter notebooks, seems to require only localhost traffic flow without restrictions.
Please let us know in the task if you have a use case that requires your environment to bind to a network port on stat100x or notebook100x and accept traffic from other hosts. For example, having a python app that binds to port 33000 on stat1007 and listens/accepts traffic from other stat or notebook hosts.
If we don't hear anything, we'll start adding base firewall rules to one host at the time during the upcoming weeks, tracking our work on the aforementioned task.
Thanks!
Luca (on behalf of the Analytics team) _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Isaac,
Il giorno mer 10 lug 2019 alle ore 16:14 Isaac Johnson isaac@wikimedia.org ha scritto:
Hey Luca, We discussed this in Research and it all sounds good to us with one question below. If something else arises, we'll ping you. Thanks for the heads up!
We assumed that instructing Spark to use a predefined
range of random ports was not possible, but in https://phabricator.wikimedia.org/T170826 we discovered that there is a way (that seems to work fine from our tests).
Will we need to change anything in our configuration or will this be automatic?
On the stat hosts the change is already live and your new spark sessions will pick it up automatically, on the notebooks we'll need to restart the spark sessions before enabling the firewall. I am planning to contact all the owners of a Spark session on notebook100[3,4], so if anybody sees an email from me then there will be an action to do, otherwise none :)
Luca
Sounds perfect Luca -- thanks for the clarification!
On Wed, Jul 10, 2019 at 9:20 AM Luca Toscano ltoscano@wikimedia.org wrote:
Hi Isaac,
Il giorno mer 10 lug 2019 alle ore 16:14 Isaac Johnson < isaac@wikimedia.org> ha scritto:
Hey Luca, We discussed this in Research and it all sounds good to us with one question below. If something else arises, we'll ping you. Thanks for the heads up!
We assumed that instructing Spark to use a predefined
range of random ports was not possible, but in https://phabricator.wikimedia.org/T170826 we discovered that there is a way (that seems to work fine from our tests).
Will we need to change anything in our configuration or will this be automatic?
On the stat hosts the change is already live and your new spark sessions will pick it up automatically, on the notebooks we'll need to restart the spark sessions before enabling the firewall. I am planning to contact all the owners of a Spark session on notebook100[3,4], so if anybody sees an email from me then there will be an action to do, otherwise none :)
Luca _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l