Hi all!

We’ve recently made Spark 2.1 available in the Analytics Hadoop cluster. It is installed on stat1004 and stat1005 alongside Spark 1.6. To use Spark 2, you should access it via the spark2* (and pyspark2) executables, rather than the usual spark-shell, spark-submit, etc.

I’ve added a little bit of documentation about this on wikitech.

We’d like to deploy Spark 2.2, but we first need to upgrade Hadoop to use Java 8 rather than Java 7. Hopefully this will happen in early 2018.

analytics/refinery/source still uses Spark 1, but we’d also like to update jobs and dependencies there to use Spark 2 soon.

Anyway, let me know if there are any questions. Enjoy!

- Andrew Otto

Systems Engineer, WMF