Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter some big tables. One of them is logging, which, for instance, in wikidata takes around 8h. Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW based replication (what we use in sanitariums) this change needs to be done with replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will break replication till the column is added on the labs hosts, and this is less desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will notify when I start it) for the sanitarium host in s5, that means that there will be lag on the labs servers, for a few hours, on the s5 instance (which will also affect s1 and s3 because we are using the same replication thread for those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own replication thread.
Should you have any questions, let me know!
Thanks Manuel.
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui manuel@wikimedia.org wrote:
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter some big tables. One of them is logging, which, for instance, in wikidata takes around 8h. Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW based replication (what we use in sanitariums) this change needs to be done with replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will break replication till the column is added on the labs hosts, and this is less desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will notify when I start it) for the sanitarium host in s5, that means that there will be lag on the labs servers, for a few hours, on the s5 instance (which will also affect s1 and s3 because we are using the same replication thread for those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own replication thread.
Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Bryan
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui manuel@wikimedia.org wrote:
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter
some
big tables. One of them is logging, which, for instance, in wikidata takes around 8h. Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW
based
replication (what we use in sanitariums) this change needs to be done
with
replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will break replication till the column is added on the labs hosts, and this is less desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will notify when I start it) for the sanitarium host in s5, that means that there
will
be lag on the labs servers, for a few hours, on the s5 instance (which
will
also affect s1 and s3 because we are using the same replication thread
for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own
replication
thread.
Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.
Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around 900). I have throttled it a bit and it has been running for an hour without any significant delay on the new replicas. labsdb1003 is delayed a bit, but it normally is lately, so I don't think it is related to this change. This should take another 15h or so to finish completely.
Cheers Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui manuel@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui manuel@wikimedia.org wrote:
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter
some
big tables. One of them is logging, which, for instance, in wikidata takes around
8h.
Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW
based
replication (what we use in sanitariums) this change needs to be done
with
replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will break replication till the column is added on the labs hosts, and this is less desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will
notify
when I start it) for the sanitarium host in s5, that means that there
will
be lag on the labs servers, for a few hours, on the s5 instance (which
will
also affect s1 and s3 because we are using the same replication thread
for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own
replication
thread.
Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.
Hello,
I will be running this schema change on s2 on Monday. Expect delay on s2 on the replicas.
Manuel.
On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui marostegui@wikimedia.org wrote:
Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around 900). I have throttled it a bit and it has been running for an hour without any significant delay on the new replicas. labsdb1003 is delayed a bit, but it normally is lately, so I don't think it is related to this change. This should take another 15h or so to finish completely.
Cheers Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui manuel@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui manuel@wikimedia.org wrote:
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter
some
big tables. One of them is logging, which, for instance, in wikidata takes around
8h.
Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW
based
replication (what we use in sanitariums) this change needs to be done
with
replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will
break
replication till the column is added on the labs hosts, and this is
less
desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will
notify
when I start it) for the sanitarium host in s5, that means that there
will
be lag on the labs servers, for a few hours, on the s5 instance (which
will
also affect s1 and s3 because we are using the same replication thread
for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own
replication
thread.
Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.
Hello!
It is time for s4. I will be doing it tomorrow on the sanitarium master. There will be around 3h delay, as the logging table is quite big and takes around 2-3h to ALTER.
Manuel.
On Thu, Dec 7, 2017 at 8:38 AM, Manuel Arostegui marostegui@wikimedia.org wrote:
Hello,
I will be running this schema change on s2 on Monday. Expect delay on s2 on the replicas.
Manuel.
On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around 900). I have throttled it a bit and it has been running for an hour without any significant delay on the new replicas. labsdb1003 is delayed a bit, but it normally is lately, so I don't think it is related to this change. This should take another 15h or so to finish completely.
Cheers Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui manuel@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui manuel@wikimedia.org wrote:
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to
alter some
big tables. One of them is logging, which, for instance, in wikidata takes around
8h.
Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and
ROW based
replication (what we use in sanitariums) this change needs to be done
with
replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will
break
replication till the column is added on the labs hosts, and this is
less
desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will
notify
when I start it) for the sanitarium host in s5, that means that there
will
be lag on the labs servers, for a few hours, on the s5 instance
(which will
also affect s1 and s3 because we are using the same replication
thread for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own
replication
thread.
Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.
Hello again!
I will be altering s1 tomorrow early european morning. Expect some delay on labs!
Manuel.
On Tue, Dec 12, 2017 at 5:03 PM, Manuel Arostegui marostegui@wikimedia.org wrote:
Hello!
It is time for s4. I will be doing it tomorrow on the sanitarium master. There will be around 3h delay, as the logging table is quite big and takes around 2-3h to ALTER.
Manuel.
On Thu, Dec 7, 2017 at 8:38 AM, Manuel Arostegui <marostegui@wikimedia.org
wrote:
Hello,
I will be running this schema change on s2 on Monday. Expect delay on s2 on the replicas.
Manuel.
On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around 900). I have throttled it a bit and it has been running for an hour without any significant delay on the new replicas. labsdb1003 is delayed a bit, but it normally is lately, so I don't think it is related to this change. This should take another 15h or so to finish completely.
Cheers Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui manuel@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui < manuel@wikimedia.org> wrote:
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to
alter some
big tables. One of them is logging, which, for instance, in wikidata takes
around 8h.
Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and
ROW based
replication (what we use in sanitariums) this change needs to be
done with
replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will
break
replication till the column is added on the labs hosts, and this is
less
desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will
notify
when I start it) for the sanitarium host in s5, that means that
there will
be lag on the labs servers, for a few hours, on the s5 instance
(which will
also affect s1 and s3 because we are using the same replication
thread for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own
replication
thread.
Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.
Happy new year!
Tomorrow I will deploy this change on s7, so expect some delay there.
Thanks Manuel.
On Mon, Dec 18, 2017 at 4:54 PM, Manuel Arostegui marostegui@wikimedia.org wrote:
Hello again!
I will be altering s1 tomorrow early european morning. Expect some delay on labs!
Manuel.
On Tue, Dec 12, 2017 at 5:03 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello!
It is time for s4. I will be doing it tomorrow on the sanitarium master. There will be around 3h delay, as the logging table is quite big and takes around 2-3h to ALTER.
Manuel.
On Thu, Dec 7, 2017 at 8:38 AM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello,
I will be running this schema change on s2 on Monday. Expect delay on s2 on the replicas.
Manuel.
On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around 900). I have throttled it a bit and it has been running for an hour without any significant delay on the new replicas. labsdb1003 is delayed a bit, but it normally is lately, so I don't think it is related to this change. This should take another 15h or so to finish completely.
Cheers Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui <manuel@wikimedia.org
wrote:
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui < manuel@wikimedia.org> wrote: > Hello Cloud Admins! > > As part of https://phabricator.wikimedia.org/T174569 we have to alter some > big tables. > One of them is logging, which, for instance, in wikidata takes around 8h. > Which is the shard I am currently working on. > > Because of the nature of the change (some columns being added) and ROW based > replication (what we use in sanitariums) this change needs to be done with > replication (from sanitarium, or their masters, to the labs servers). > > This will obviously generate lag and if not done that way, it will break > replication till the column is added on the labs hosts, and this is less > desirable than replication lag. > > I am planning to run the alter probably tomorrow or Monday (I will notify > when I start it) for the sanitarium host in s5, that means that there will > be lag on the labs servers, for a few hours, on the s5 instance (which will > also affect s1 and s3 because we are using the same replication thread for > those shards too - which is a FIXME we have pending). > > s2, s4, s6 and s7 will remain unaffected as they have their own replication > thread. > > Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be ready to tell people that the lag is a known issue due to production schema changes?
Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.
Hello,
I will alter s5 tomorrow. Expect some delay there.
On Wed, Jan 3, 2018 at 2:49 PM, Manuel Arostegui marostegui@wikimedia.org wrote:
Happy new year!
Tomorrow I will deploy this change on s7, so expect some delay there.
Thanks Manuel.
On Mon, Dec 18, 2017 at 4:54 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello again!
I will be altering s1 tomorrow early european morning. Expect some delay on labs!
Manuel.
On Tue, Dec 12, 2017 at 5:03 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello!
It is time for s4. I will be doing it tomorrow on the sanitarium master. There will be around 3h delay, as the logging table is quite big and takes around 2-3h to ALTER.
Manuel.
On Thu, Dec 7, 2017 at 8:38 AM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello,
I will be running this schema change on s2 on Monday. Expect delay on s2 on the replicas.
Manuel.
On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around 900). I have throttled it a bit and it has been running for an hour without any significant delay on the new replicas. labsdb1003 is delayed a bit, but it normally is lately, so I don't think it is related to this change. This should take another 15h or so to finish completely.
Cheers Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui < manuel@wikimedia.org> wrote:
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org wrote:
> On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui < > manuel@wikimedia.org> wrote: > > Hello Cloud Admins! > > > > As part of https://phabricator.wikimedia.org/T174569 we have to > alter some > > big tables. > > One of them is logging, which, for instance, in wikidata takes > around 8h. > > Which is the shard I am currently working on. > > > > Because of the nature of the change (some columns being added) and > ROW based > > replication (what we use in sanitariums) this change needs to be > done with > > replication (from sanitarium, or their masters, to the labs > servers). > > > > This will obviously generate lag and if not done that way, it will > break > > replication till the column is added on the labs hosts, and this > is less > > desirable than replication lag. > > > > I am planning to run the alter probably tomorrow or Monday (I will > notify > > when I start it) for the sanitarium host in s5, that means that > there will > > be lag on the labs servers, for a few hours, on the s5 instance > (which will > > also affect s1 and s3 because we are using the same replication > thread for > > those shards too - which is a FIXME we have pending). > > > > s2, s4, s6 and s7 will remain unaffected as they have their own > replication > > thread. > > > > Should you have any questions, let me know! > > Should we send a message to cloud-announce about this, or just be > ready to tell people that the lag is a known issue due to production > schema changes? > > Don't think it is necessary to send an announcement about it, it is just maintenance. I would suggest you just just to point people to that task so they can know when other shards will be done too :-)
Manuel.
Hello,
The last shard: s8, will be altered tomorrow, so expect quite some hours of lag as the wikidatawiki.logging table is quite big.
On Wed, Jan 10, 2018 at 1:23 PM, Manuel Arostegui marostegui@wikimedia.org wrote:
Hello,
I will alter s5 tomorrow. Expect some delay there.
On Wed, Jan 3, 2018 at 2:49 PM, Manuel Arostegui <marostegui@wikimedia.org
wrote:
Happy new year!
Tomorrow I will deploy this change on s7, so expect some delay there.
Thanks Manuel.
On Mon, Dec 18, 2017 at 4:54 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello again!
I will be altering s1 tomorrow early european morning. Expect some delay on labs!
Manuel.
On Tue, Dec 12, 2017 at 5:03 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello!
It is time for s4. I will be doing it tomorrow on the sanitarium master. There will be around 3h delay, as the logging table is quite big and takes around 2-3h to ALTER.
Manuel.
On Thu, Dec 7, 2017 at 8:38 AM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hello,
I will be running this schema change on s2 on Monday. Expect delay on s2 on the replicas.
Manuel.
On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui < marostegui@wikimedia.org> wrote:
Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around 900). I have throttled it a bit and it has been running for an hour without any significant delay on the new replicas. labsdb1003 is delayed a bit, but it normally is lately, so I don't think it is related to this change. This should take another 15h or so to finish completely.
Cheers Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui < manuel@wikimedia.org> wrote:
> > > On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis bd808@wikimedia.org > wrote: > >> On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui < >> manuel@wikimedia.org> wrote: >> > Hello Cloud Admins! >> > >> > As part of https://phabricator.wikimedia.org/T174569 we have to >> alter some >> > big tables. >> > One of them is logging, which, for instance, in wikidata takes >> around 8h. >> > Which is the shard I am currently working on. >> > >> > Because of the nature of the change (some columns being added) >> and ROW based >> > replication (what we use in sanitariums) this change needs to be >> done with >> > replication (from sanitarium, or their masters, to the labs >> servers). >> > >> > This will obviously generate lag and if not done that way, it >> will break >> > replication till the column is added on the labs hosts, and this >> is less >> > desirable than replication lag. >> > >> > I am planning to run the alter probably tomorrow or Monday (I >> will notify >> > when I start it) for the sanitarium host in s5, that means that >> there will >> > be lag on the labs servers, for a few hours, on the s5 instance >> (which will >> > also affect s1 and s3 because we are using the same replication >> thread for >> > those shards too - which is a FIXME we have pending). >> > >> > s2, s4, s6 and s7 will remain unaffected as they have their own >> replication >> > thread. >> > >> > Should you have any questions, let me know! >> >> Should we send a message to cloud-announce about this, or just be >> ready to tell people that the lag is a known issue due to production >> schema changes? >> >> > Don't think it is necessary to send an announcement about it, it is > just maintenance. I would suggest you just just to point people to that > task so they can know when other shards will be done too :-) > > Manuel. >
cloud-admin@lists.wikimedia.org