great, thanks for the toolforge tips!
for now i think i can go a long way hacking locally on datasets, even if
it's a pain to get them.
but i can totally see how toolforge is going to be perfect for anything i
need to repeat
great ecosystem! lots to learn
On Wed, 23 Dec 2020 at 10:59, AntiCompositeNumber <
anticompositenumber(a)gmail.com> wrote:
Due to
<https://phabricator.wikimedia.org/T188684>, PAWS isn't really
a good environment for long-running unattended or under-attended
tasks. You can often make it just about work, but as you noticed, the
memory limits can also make such tasks more difficult.
Once I start to hit PAWS limits, I usually switch to Toolforge. I've
found that writing simple HTML to a file in the static directory
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Static_file_server
is usually a good intermediary between a
PAWS notebook and writing a
full-blown webservice. You can easily open a Python container with
`webservice --backend=kubernetes python3.7 shell` and run things from
there. <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python> has
more information.
AntiCompositeNumber
On Tue, Dec 22, 2020 at 4:47 PM Mat Kelcey <matthew.kelcey(a)gmail.com>
wrote:
> Thanks Nicholas for the response,
apologies this isn't threaded, I was
subscribed only to a daily digest.
> Here's a version of the notebook
that (sometimes) shows the lost
connection problem.
https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro…
> It either fails directly with OOM or
we lose connection to the server; I
think it's as simple as it being just a long running query with a large
result set. I'm thinking perhaps PAWS just isn't right for these types of
queries? Not sure what tuning I can do, re: PAWS config or the query
itself, I think I just need to learn more about other execution
environments.
> In any case I have a way of running
the query with minimal
postprocessing that doesn't OOM, that I can write to disk and download to
my local machine to play with. That's fine for now as I poke around with
the dataset.
> Cheers,
> Mat
> > hi all!
>
> > as part of task "Look
into matching images of the same painting"
> >
https://phabricator.wikimedia.org/T131553
> > <https://phabricator.wikimedia.org/T131553>
> > i've been trying to reproduce some sql queries as described in
>
https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painti…
>
> > where as usually these
scripts would be running under toolforge (or
some
> other bot execution environment i'm not
sure of) i've been finding
these
> > long running queries timeout under PAWS
>
> > does anyone have
suggestions / examples for running queries such as
>
http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wiki…
> > under PAWS?
>
> > cheers,
> > mat
> > ____
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l