TL;DR: In https://phabricator.wikimedia.org/T170826
the Analytics team
wants to add base firewall rules to stat100x and notebook100x hosts, that
will cause any non-localhost or known traffic to be blocked by default.
Please let us know in the task if this is a problem for you.
the Analytics team has always left the stat100x and notebook100x hosts
without a set of base firewall rules to avoid impacting any
research/test/etc.. activity on those hosts. This choice has a lot of
downsides, one of the most problematic ones is that usually environments
like the Python venvs can install potentially any package, and if the owner
does not pay attention to security upgrades then we may have a security
problem if the environment happens to bind to a network port and accept
traffic from anywhere.
One of the biggest problems was Spark: when somebody launches a shell using
Hadoop Yarn (--master yarn), a Driver component is created that needs to
bind to a random port to be able to communicate with the workers created on
the Hadoop cluster. We assumed that instructing Spark to use a predefined
range of random ports was not possible, but in
we discovered that there is a way
(that seems to work fine from our tests). The other big use case that we
know, Jupyter notebooks, seems to require only localhost traffic flow
Please let us know in the task if you have a use case that requires your
environment to bind to a network port on stat100x or notebook100x and
accept traffic from other hosts. For example, having a python app that
binds to port 33000 on stat1007 and listens/accepts traffic from other stat
or notebook hosts.
If we don't hear anything, we'll start adding base firewall rules to one
host at the time during the upcoming weeks, tracking our work on the
Luca (on behalf of the Analytics team)