Thanks Nicholas for the response, apologies this isn't threaded, I was
subscribed only to a daily digest.
Here's a version of the notebook that (sometimes) shows the lost connection
problem.
https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro…
It either fails directly with OOM or we lose connection to the server; I
think it's as simple as it being just a long running query with a large
result set. I'm thinking perhaps PAWS just isn't right for these types of
queries? Not sure what tuning I can do, re: PAWS config or the query
itself, I think I just need to learn more about other execution
environments.
In any case I have a way of running the query with minimal postprocessing
that doesn't OOM, that I can write to disk and download to my local machine
to play with. That's fine for now as I poke around with the dataset.
Cheers,
Mat
hi all!
as part of task "Look into matching images of the same painting"
https://phabricator.wikimedia.org/T131553
<https://phabricator.wikimedia.org/T131553>
i've been trying to reproduce some sql queries as described in
https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painti…
where as usually these scripts would be running under toolforge (or some
other bot execution environment i'm not sure of) i've been finding these
long running queries timeout under PAWS
does anyone have suggestions / examples for running queries such as
http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wiki…
under PAWS?
cheers,
mat
____