Thanks Nicholas for the response, apologies this isn't threaded, I was subscribed only to a daily digest.
Here's a version of the notebook that (sometimes) shows the lost connection problem.
It either fails directly with OOM or we lose connection to the server; I think it's as simple as it being just a long running query with a large result set. I'm thinking perhaps PAWS just isn't right for these types of queries? Not sure what tuning I can do, re: PAWS config or the query itself, I think I just need to learn more about other execution environments.
In any case I have a way of running the query with minimal postprocessing that doesn't OOM, that I can write to disk and download to my local machine to play with. That's fine for now as I poke around with the dataset.
Cheers,
Mat