might be a good choice. You are correct about
Cloud VPS instances. If your work becomes its own project, you are more
than welcome to request a new project on Cloud VPS.
On Thu, Apr 1, 2021 at 10:26 AM Roy Smith <roy(a)panix.com> wrote:
I'd like to continue exploring this, just not
quite sure of the
appropriate way forward. I gather doing work like this on the toolforge
bastion hosts is frowned upon, so I guess what I should be doing is
spinning up a VPS instance on
https://horizon.wikimedia.org/? I've been
reading through
https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances, from which
I gather I need to be a project admin. Is there some existing project I
can join to do this exploratory work, or should I create a new project?
On Mar 31, 2021, at 10:35 PM, Roy Smith <roy(a)panix.com> wrote:
Thanks for looking into this. I tried this again a little later, and it
ran fine. Odd that the amount of memory used depends on the number of
rows. I would expect it would stream results to stdout as they came in,
but apparently not.
Even weirder that the 100M example runs OOM in 10s, while the 10M example
runs to completion in 36s. Could it be pre-allocating buffer space for the
number of rows it expects to ultimately get? Ugh, that would be a crazy
design, but it does seem like that's what's happening.
On Mar 31, 2021, at 9:47 PM, Brooke Storm <bstorm(a)wikimedia.org> wrote:
On Mar 31, 2021, at 5:18 PM, Roy Smith <roy(a)panix.com> wrote:
I'm just playing around on tools-sgebastion-08. I can dump the first 1
million image names about half a minute:
tools.spi-tools-dev:xw-join$ time mysql
--defaults-file=$HOME/replica.my.cnf -h
commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select img_name
from image limit 10000000 ' > /dev/null
real 0m36.586s
user 0m9.678s
sys 0m1.324s
but if I try 10 million, it fails:
tools.spi-tools-dev:xw-join$ time mysql
--defaults-file=$HOME/replica.my.cnf -h
commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select img_name
from image limit 100000000 ' > /dev/null
Killed
real 0m9.875s
user 0m1.417s
sys 0m1.561s
Is there some maximum query size configured by default? The full image
table on commons is about 70M rows, so extrapolating from the first
example, something like 1 hour to move all that data.
That could be RAM limits on the bastion.
Actually, scratch that, I’ve confirmed you were killed by the OOM killer
on that bastion:
Mar 31 23:29:17 tools-sgebastion-08 kernel: [2860588.199138] mysql invoked
oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0,
oom_score_adj=0
-Brooke_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
--
*Nicholas Skaggs*
Engineering Manager, Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>