Good suggestions, Andrew! I'll try those if I encounter this again.

Nuria, we had a discussion about the appropriate places to ask questions about internal systems in October 2018, and the verdict (supported by you) was that we should use this list or the public IRC channel.

If you want to revisit that decision, I'd suggest you consult that thread first (the subject was "Where to ask questions about internal analytics tools") because I included a detailed list of pros and cons of different channels to start the discussion. In that list, I even mentioned that such discussions on this channel could annoy subscribers who don't have access to these systems 🙂

If you still want us to use a different list, we can certainly do that. If so, please send my team a message and update the docs I added so it stays clear.

On Fri, 7 Feb 2020 at 07:48, Nuria Ruiz wrote:

Probably this discussion is not of wide interest to this public list, I suggest to move it to analytics-internal?



On Fri, Feb 7, 2020 at 6:53 AM Andrew Otto wrote:
Hm, interesting!  I don't think many of us have used SparkSession.builder.getOrCreate repeatedly in the same process.  What happens if you manually stop the spark session first, (session.stop()?) or maybe try to explicitly create a new session via newSession()?

On Thu, Feb 6, 2020 at 7:31 PM Neil Shah-Quinn wrote:
Hi Luca!

Those were separate Yarn jobs I started later. When I got this error, I found that the Yarn job corresponding to the SparkContext was marked as "successful", but I still couldn't get SparkSession.builder.getOrCreate to open a new one.

Any idea what might have caused that or how I could recover without restarting the notebook, which could mean losing a lot of in-progress work? I had already restarted that kernel so I don't know if I'll encounter this problem again. If I do, I'll file a task.

On Wed, 5 Feb 2020 at 23:24, Luca Toscano wrote:
Hey Neil,

there were two Yarn jobs running related to your notebooks, I just killed them, let's see if it solves the problem (you might need to restart again your notebook). If not, let's open a task and investigate :)


On Thu, 6 Feb 2020, Neil Shah-Quinn wrote:
Whoa—I just got the same stopped SparkContext error on the query even after restarting the notebook, without an intermediate Java heap space error. That seems very strange to me.

On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn wrote:
Hey there!

I was running SQL queries via PySpark (using the wmfdata package) on SWAP when one of my queries failed with "java.lang.OutofMemoryError: Java heap space".

After that, when I tried to call the spark.sql function again (via, it failed with "java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext."

When I tried to create a new Spark context using SparkSession.builder.getOrCreate (whether using wmfdata.spark.get_session or directly), it returned a SparkContent object properly, but calling the object's sql function still gave the "stopped SparkContext error".

Any idea what's going on? I assume restarting the notebook kernel would take care of the problem, but it seems like there has to be a better way to recover.

Thank you!

