Hi all!
We’ve recently made Spark 2.1 available in the Analytics Hadoop cluster.
It is installed on stat1004 and stat1005 alongside Spark 1.6. To use Spark
2, you should access it via the spark2* (and pyspark2) executables, rather
than the usual spark-shell, spark-submit, etc.
I’ve added a little bit of documentation
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark> about
this on wikitech.
We’d like to deploy Spark 2.2, but we first need to upgrade Hadoop to use
Java 8 rather than Java 7. Hopefully this will happen in early 2018.
analytics/refinery/source
<https://github.com/wikimedia/analytics-refinery-source> still uses Spark
1, but we’d also like to update jobs and dependencies there to use Spark 2
soon.
Anyway, let me know if there are any questions. Enjoy!
- Andrew Otto
Systems Engineer, WMF
Show replies by date