Due to <https://phabricator.wikimedia.org/T188684>, PAWS isn't really
a good environment for long-running unattended or under-attended
tasks. You can often make it just about work, but as you noticed, the
memory limits can also make such tasks more difficult.
Once I start to hit PAWS limits, I usually switch to Toolforge. I've
found that writing simple HTML to a file in the static directory
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Static_file_server>
is usually a good intermediary between a PAWS notebook and writing a
full-blown webservice. You can easily open a Python container with
`webservice --backend=kubernetes python3.7 shell` and run things from
there. <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python> has
more information.
AntiCompositeNumber
On Tue, Dec 22, 2020 at 4:47 PM Mat Kelcey <matthew.kelcey(a)gmail.com> wrote:
Thanks Nicholas for the response, apologies this isn't threaded, I was subscribed
only to a daily digest.
Here's a version of the notebook that (sometimes) shows the lost connection problem.
https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro…
It either fails directly with OOM or we lose connection to the server; I think it's
as simple as it being just a long running query with a large result set. I'm thinking
perhaps PAWS just isn't right for these types of queries? Not sure what tuning I can
do, re: PAWS config or the query itself, I think I just need to learn more about other
execution environments.
In any case I have a way of running the query with minimal postprocessing that
doesn't OOM, that I can write to disk and download to my local machine to play with.
That's fine for now as I poke around with the dataset.
Cheers,
Mat
hi all!
as part of task "Look into matching images of the same painting"
https://phabricator.wikimedia.org/T131553
<https://phabricator.wikimedia.org/T131553>
i've been trying to reproduce some sql queries as described in
https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painti…
where as usually these scripts would be running under toolforge (or some
other bot execution environment i'm not sure of) i've been finding these
long running queries timeout under PAWS
does anyone have suggestions / examples for running queries such as
http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wiki…
under PAWS?
cheers,
mat
____
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l