On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui manuel@wikimedia.org wrote:
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter
some
big tables. One of them is logging, which, for instance, in wikidata takes around 8h. Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW
based
replication (what we use in sanitariums) this change needs to be done
with
replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will break replication till the column is added on the labs hosts, and this is less desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will notify when I start it) for the sanitarium host in s5, that means that there
will
be lag on the labs servers, for a few hours, on the s5 instance (which
will
also affect s1 and s3 because we are using the same replication thread
for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own
replication
thread.
Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.