Hello!
tl;dr we will be removing Python and other Debian packages installed for ad-hoc usage. https://phabricator.wikimedia.org/T275786
Now that we've got conda environments https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda, and the ability to ship them to worker nodes https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#PySpark_and_wmfdata, it should be much easier for Analytics Cluster users to install and use the dependencies they need.
Previously, when someone needed a python (or other) dependency, they'd file a request in Phabricator(e.g. T197896 https://phabricator.wikimedia.org/T197896) and Analytics team SREs would either install an existent Python Debian package, or figure out how to build a Debian package for it.
There are numerous https://github.com/wikimedia/puppet/tree/production/modules/profile/manifests/analytics/cluster/packages Python and other Debian packages installed on stat boxes and across the Hadoop workers. We'd like to stop maintaining these and remove them. We'll be doing so over the next few weeks. If you run into any issues with Python dependencies, let us know here https://phabricator.wikimedia.org/T275786.
Thanks! -Andrew Otto SRE, Data Engineering