Thanks Nicholas for the response, apologies this isn't threaded, I was subscribed only to a daily digest.
Here's a version of the notebook that (sometimes) shows the lost connection problem. https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro....
It either fails directly with OOM or we lose connection to the server; I think it's as simple as it being just a long running query with a large result set. I'm thinking perhaps PAWS just isn't right for these types of queries? Not sure what tuning I can do, re: PAWS config or the query itself, I think I just need to learn more about other execution environments.
In any case I have a way of running the query with minimal postprocessing that doesn't OOM, that I can write to disk and download to my local machine to play with. That's fine for now as I poke around with the dataset.
Cheers, Mat
hi all!
as part of task "Look into matching images of the same painting" https://phabricator.wikimedia.org/T131553 https://phabricator.wikimedia.org/T131553 i've been trying to reproduce some sql queries as described in
https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_paintin...
where as usually these scripts would be running under toolforge (or some other bot execution environment i'm not sure of) i've been finding these long running queries timeout under PAWS
does anyone have suggestions / examples for running queries such as
http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wikid...
under PAWS?
cheers, mat ____