Due to https://phabricator.wikimedia.org/T188684, PAWS isn't really a good environment for long-running unattended or under-attended tasks. You can often make it just about work, but as you noticed, the memory limits can also make such tasks more difficult.
Once I start to hit PAWS limits, I usually switch to Toolforge. I've found that writing simple HTML to a file in the static directory https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Static_file_server is usually a good intermediary between a PAWS notebook and writing a full-blown webservice. You can easily open a Python container with `webservice --backend=kubernetes python3.7 shell` and run things from there. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python has more information.
AntiCompositeNumber
On Tue, Dec 22, 2020 at 4:47 PM Mat Kelcey matthew.kelcey@gmail.com wrote:
Thanks Nicholas for the response, apologies this isn't threaded, I was subscribed only to a daily digest.
Here's a version of the notebook that (sometimes) shows the lost connection problem. https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro....
It either fails directly with OOM or we lose connection to the server; I think it's as simple as it being just a long running query with a large result set. I'm thinking perhaps PAWS just isn't right for these types of queries? Not sure what tuning I can do, re: PAWS config or the query itself, I think I just need to learn more about other execution environments.
In any case I have a way of running the query with minimal postprocessing that doesn't OOM, that I can write to disk and download to my local machine to play with. That's fine for now as I poke around with the dataset.
Cheers, Mat
hi all!
as part of task "Look into matching images of the same painting" https://phabricator.wikimedia.org/T131553 https://phabricator.wikimedia.org/T131553 i've been trying to reproduce some sql queries as described in https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_paintin...
where as usually these scripts would be running under toolforge (or some other bot execution environment i'm not sure of) i've been finding these long running queries timeout under PAWS
does anyone have suggestions / examples for running queries such as http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wikid... under PAWS?
cheers, mat ____
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l