Thanks Nicholas for the response, apologies this isn't threaded, I was subscribed only to a daily digest.

Here's a version of the notebook that (sometimes) shows the lost connection problem. 
https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro.ipynb

It either fails directly with OOM or we lose connection to the server; I think it's as simple as it being just a long running query with a large result set. I'm thinking perhaps PAWS just isn't right for these types of queries? Not sure what tuning I can do, re: PAWS config or the query itself, I think I just need to learn more about other execution environments.

In any case I have a way of running the query with minimal postprocessing that doesn't OOM, that I can write to disk and download to my local machine to play with. That's fine for now as I poke around with the dataset.

Cheers,
Mat

> hi all!
>
> as part of task "Look into matching images of the same painting"
https://phabricator.wikimedia.org/T131553
> <https://phabricator.wikimedia.org/T131553>
> i've been trying to reproduce some sql queries as described in
https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painting_images.py
>
> where as usually these scripts would be running under toolforge (or some
> other bot execution environment i'm not sure of) i've been finding these
> long running queries timeout under PAWS
>
> does anyone have suggestions / examples for running queries such as
http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wikidata_all.sql
> under PAWS?
>
> cheers,
> mat
> ____