Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

1 Apr 2021

I'm just playing around on tools-sgebastion-08.  I can dump the first 1 million image
names about half a minute:

...
  tools.spi-tools-dev:xw-join$ time mysql 
--defaults-file=$HOME/replica.my.cnf -h commonswiki.web.db.svc.wikimedia.cloud
commonswiki_p -N -e 'select img_name from image limit 10000000 ' > /dev/null

 real    0m36.586s
 user    0m9.678s
 sys     0m1.324s 
but if I try 10 million, it fails:

...
  tools.spi-tools-dev:xw-join$ time mysql 
--defaults-file=$HOME/replica.my.cnf -h commonswiki.web.db.svc.wikimedia.cloud
commonswiki_p -N -e 'select img_name from image limit 100000000 ' > /dev/null
 Killed

 real    0m9.875s
 user    0m1.417s
 sys     0m1.561s 
Is there some maximum query size configured by default?  The full image table on commons
is about 70M rows, so extrapolating from the first example, something like 1 hour to move
all that data.

...
  On Mar 31, 2021, at 6:09 PM, Daniel Schwen
&lt;lists(a)schwen.de&gt; wrote:

 I run FastilyBot on a Raspberry Pi, and needless to say it would be grossly impractical
for me to perform a "join" in the bot's code.

 Why not run it on WMF Cloud? In code joins will very likely work there and Cloud is
supported. You are effectively asking to also support a second way here.

 And I think it is by no means "needless" to say that it would be "grossly
impractical" to do this in the bot code. Joining page titles from commons and en.wp
could for example be done by having the DB servers return a sorted stream of page titles
(incrementally with limit). Your app code would advance the stream that has the
lexicographically smaller title (returning equal titles). This would require little effort
on the Pi, however it would transfer two big data streams (2-3 GB maybe - so feasibility
would depend on your connection, but again this would be cheap on Cloud!). 
 _______________________________________________
 Wikimedia Cloud Services mailing list
 Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/cloud 

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration