Re: [Analytics] SparkContext stopped and cannot be restarted

7 Feb 2020


      Hello,
Probably this discussion is not of wide interest to this public list, I
suggest to move it to analytics-internal?
Thanks,
Nuria
On Fri, Feb 7, 2020 at 6:53 AM Andrew Otto otto@wikimedia.org wrote:
...
Hm, interesting!  I don't think many of us have used SparkSession.builder.getOrCreate
repeatedly in the same process.  What happens if you manually stop the
spark session first, (session.stop()
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.stop?)
or maybe try to explicitly create a new session via newSession()
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.newSession
?
On Thu, Feb 6, 2020 at 7:31 PM Neil Shah-Quinn nshahquinn@wikimedia.org
wrote:
...
Hi Luca!
Those were separate Yarn jobs I started later. When I got this error, I
found that the Yarn job corresponding to the SparkContext was marked as
"successful", but I still couldn't get SparkSession.builder.getOrCreate to
open a new one.
Any idea what might have caused that or how I could recover without
restarting the notebook, which could mean losing a lot of in-progress work?
I had already restarted that kernel so I don't know if I'll encounter this
problem again. If I do, I'll file a task.
On Wed, 5 Feb 2020 at 23:24, Luca Toscano ltoscano@wikimedia.org wrote:
...
Hey Neil,
there were two Yarn jobs running related to your notebooks, I just
killed them, let's see if it solves the problem (you might need to restart
again your notebook). If not, let's open a task and investigate :)
Luca
Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn <
nshahquinn@wikimedia.org> ha scritto:
...
Whoa—I just got the same stopped SparkContext error on the query even
after restarting the notebook, without an intermediate Java heap space
error. That seems very strange to me.
On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn nshahquinn@wikimedia.org
wrote:
...
Hey there!
I was running SQL queries via PySpark (using the wmfdata package
https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/hive.py)
on SWAP when one of my queries failed with "java.lang.OutofMemoryError:
Java heap space".
After that, when I tried to call the spark.sql function again (via
wmfdata.hive.run), it failed with "java.lang.IllegalStateException: Cannot
call methods on a stopped SparkContext."
When I tried to create a new Spark context using
SparkSession.builder.getOrCreate (whether using wmfdata.spark.get_session
or directly), it returned a SparkContent object properly, but calling the
object's sql function still gave the "stopped SparkContext error".
Any idea what's going on? I assume restarting the notebook kernel
would take care of the problem, but it seems like there has to be a better
way to recover.
Thank you!

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] SparkContext stopped and cannot be restarted