Dear Wikitechians,
On *Wednesday March 1st*, the SRE team will run a planned data center switchover, moving all wikis from our primary data center in Virginia to the secondary data center in Texas. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems in Virginia that normally run 24 hours a day.
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday March 1st*, and will last for a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading as usual, but no one will be able to save edits during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.
CommRel has already begun notifying communities of the read-only window. A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*.
If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC (instructions for joining here https://meta.wikimedia.org/wiki/IRC/Instructions). To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the *datacenter-switchover* tag (pre-filled form here https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover&subscribers=Clement_Goubert); we'll be monitoring closely for reports of trouble during and after the switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover and its preparation are tracked tracked in Phabricator Task T327920 https://phabricator.wikimedia.org/T327920
On behalf of the SRE team, please excuse the disruption, and our thanks to everyone in a number of departments who've been involved in planning this work for the past weeks. Feel free to reply directly to me with any questions.
Thank you,
Dear Wikitechians,
I would like to remind you that the datacenter switchover will happen on *Wednesday March 1st* starting at *14:00 UTC.*
Please refer to the original email for any additional information. As always, you can reach out to me directly or the SRE team in #wikimedia-sre on IRC with any question, or through Phabricator.
Thank you,
On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert cgoubert@wikimedia.org wrote:
Dear Wikitechians,
On *Wednesday March 1st*, the SRE team will run a planned data center switchover, moving all wikis from our primary data center in Virginia to the secondary data center in Texas. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems in Virginia that normally run 24 hours a day.
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday March 1st*, and will last for a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading as usual, but no one will be able to save edits during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.
CommRel has already begun notifying communities of the read-only window. A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*.
If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC (instructions for joining here https://meta.wikimedia.org/wiki/IRC/Instructions). To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the *datacenter-switchover* tag (pre-filled form here https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover&subscribers=Clement_Goubert); we'll be monitoring closely for reports of trouble during and after the switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover and its preparation are tracked tracked in Phabricator Task T327920 https://phabricator.wikimedia.org/T327920
On behalf of the SRE team, please excuse the disruption, and our thanks to everyone in a number of departments who've been involved in planning this work for the past weeks. Feel free to reply directly to me with any questions.
Thank you,
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
Dear Wikitechians,
Dear colleagues,
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday March 1st*, and lasted *119 seconds*. All our public and private wikis continued to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again in a few minutes.
As a side note, with other SREs we have been trying to discern the effect of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org during Switchover. In many, it's impossible to tell the event. The most discernible graph we have is of the edit rate, which can be viewed here: Grafana https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&orgId=1&to=1677681000000. Can you spot it? See the attached picture to help:
I am extending thanks to everyone that was also present on IRC, helping out in any way that they could. Thanks as well to Community Relations who notified communities of the read-only window ahead of time. And thanks to everyone that contributed to MultiDC https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki, especially Performance for pushing forward with the last parts of it, allowing us to perform this Switchover faster and with more confidence than ever before.
If you wanna relive through the Switchover, here's a link to a capture of Listen to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia during the Switchover: Listen to the Switchover https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=share_link (spoiler: the part with no sounds is the switchover)
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*. Thank you,
On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert cgoubert@wikimedia.org wrote:
Dear Wikitechians,
I would like to remind you that the datacenter switchover will happen on *Wednesday March 1st* starting at *14:00 UTC.*
Please refer to the original email for any additional information. As always, you can reach out to me directly or the SRE team in #wikimedia-sre on IRC with any question, or through Phabricator.
Thank you,
On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert cgoubert@wikimedia.org wrote:
Dear Wikitechians,
On *Wednesday March 1st*, the SRE team will run a planned data center switchover, moving all wikis from our primary data center in Virginia to the secondary data center in Texas. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems in Virginia that normally run 24 hours a day.
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday March 1st*, and will last for a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading as usual, but no one will be able to save edits during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.
CommRel has already begun notifying communities of the read-only window. A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*.
If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC (instructions for joining here https://meta.wikimedia.org/wiki/IRC/Instructions). To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the *datacenter-switchover* tag (pre-filled form here https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover&subscribers=Clement_Goubert); we'll be monitoring closely for reports of trouble during and after the switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover and its preparation are tracked tracked in Phabricator Task T327920 https://phabricator.wikimedia.org/T327920
On behalf of the SRE team, please excuse the disruption, and our thanks to everyone in a number of departments who've been involved in planning this work for the past weeks. Feel free to reply directly to me with any questions.
Thank you,
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
Clément Goubert and everybody,
I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i have the another results.
Last change (before migration): 2023-03-01T14:00:30 First change (after migration): 2023-03-01T14:02:05 Result: Down time (14:00:31 to 14:02:05) is 94s.
I think that analysis is more authoritative. I think it analyzes based on something like REQUEST_TIME in PHP.
Dušan Kreheľ
2023-03-01 16:30 GMT+01:00, Clément Goubert cgoubert@wikimedia.org:
Dear Wikitechians,
Dear colleagues,
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday March 1st*, and lasted *119 seconds*. All our public and private wikis continued to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again in a few minutes.
As a side note, with other SREs we have been trying to discern the effect of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org during Switchover. In many, it's impossible to tell the event. The most discernible graph we have is of the edit rate, which can be viewed here: Grafana https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&orgId=1&to=1677681000000. Can you spot it? See the attached picture to help:
I am extending thanks to everyone that was also present on IRC, helping out in any way that they could. Thanks as well to Community Relations who notified communities of the read-only window ahead of time. And thanks to everyone that contributed to MultiDC https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki, especially Performance for pushing forward with the last parts of it, allowing us to perform this Switchover faster and with more confidence than ever before.
If you wanna relive through the Switchover, here's a link to a capture of Listen to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia during the Switchover: Listen to the Switchover https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=share_link (spoiler: the part with no sounds is the switchover)
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*. Thank you,
On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert cgoubert@wikimedia.org wrote:
Dear Wikitechians,
I would like to remind you that the datacenter switchover will happen on *Wednesday March 1st* starting at *14:00 UTC.*
Please refer to the original email for any additional information. As always, you can reach out to me directly or the SRE team in #wikimedia-sre on IRC with any question, or through Phabricator.
Thank you,
On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert cgoubert@wikimedia.org wrote:
Dear Wikitechians,
On *Wednesday March 1st*, the SRE team will run a planned data center switchover, moving all wikis from our primary data center in Virginia to the secondary data center in Texas. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems in Virginia that normally run 24 hours a day.
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday March 1st*, and will last for a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading as usual, but no one will be able to save edits during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.
CommRel has already begun notifying communities of the read-only window. A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*.
If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC (instructions for joining here https://meta.wikimedia.org/wiki/IRC/Instructions). To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the *datacenter-switchover* tag (pre-filled form here https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover&subscribers=Clement_Goubert); we'll be monitoring closely for reports of trouble during and after the switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover and its preparation are tracked tracked in Phabricator Task T327920 https://phabricator.wikimedia.org/T327920
On behalf of the SRE team, please excuse the disruption, and our thanks to everyone in a number of departments who've been involved in planning this work for the past weeks. Feel free to reply directly to me with any questions.
Thank you,
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
It's a bit complicated. When SRE sets the read-only mark, they start counting from that time and it starts propagating which takes a while to be actually shown to all users but some users might still see the RO error while some actual writes are happening somewhere else because the cache is not invalidated yet (I think it has a TTL of 5 seconds but I need to double check). We still consider that as RO time because it's affecting users regardless.
HTH
Am Mi., 1. März 2023 um 18:06 Uhr schrieb Dušan Kreheľ < dusankrehel@gmail.com>:
Clément Goubert and everybody,
I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i have the another results.
Last change (before migration): 2023-03-01T14:00:30 First change (after migration): 2023-03-01T14:02:05 Result: Down time (14:00:31 to 14:02:05) is 94s.
I think that analysis is more authoritative. I think it analyzes based on something like REQUEST_TIME in PHP.
Dušan Kreheľ
2023-03-01 16:30 GMT+01:00, Clément Goubert cgoubert@wikimedia.org:
Dear Wikitechians,
Dear colleagues,
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday March 1st*, and lasted *119 seconds*. All our public and private wikis
continued
to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again in
a
few minutes.
As a side note, with other SREs we have been trying to discern the effect of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org during Switchover. In
many,
it's impossible to tell the event. The most discernible graph we have is
of
the edit rate, which can be viewed here: Grafana <
https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&a...
. Can you spot it? See the attached picture to help:
I am extending thanks to everyone that was also present on IRC, helping
out
in any way that they could. Thanks as well to Community Relations who notified communities of the read-only window ahead of time. And thanks to everyone that contributed to MultiDC https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki, especially Performance for pushing forward with the last parts of it, allowing us to perform this Switchover faster and with more confidence
than
ever before.
If you wanna relive through the Switchover, here's a link to a capture of Listen to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia during
the
Switchover: Listen to the Switchover <
https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=s...
(spoiler: the part with no sounds is the switchover)
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*. Thank you,
On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert cgoubert@wikimedia.org wrote:
Dear Wikitechians,
I would like to remind you that the datacenter switchover will happen on *Wednesday March 1st* starting at *14:00 UTC.*
Please refer to the original email for any additional information. As always, you can reach out to me directly or the SRE team in #wikimedia-sre on IRC with any question, or through Phabricator.
Thank you,
On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert <cgoubert@wikimedia.org
wrote:
Dear Wikitechians,
On *Wednesday March 1st*, the SRE team will run a planned data center switchover, moving all wikis from our primary data center in Virginia
to
the secondary data center in Texas. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It
also
gives all our SRE and ops teams a chance to do maintenance and upgrades on systems in Virginia that normally run 24 hours a day.
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday March 1st*, and will last for a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading as usual, but no one will be able to save edits during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.
CommRel has already begun notifying communities of the read-only
window.
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*.
If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC (instructions for joining here https://meta.wikimedia.org/wiki/IRC/Instructions). To report any issues, you can reach us in #wikimedia-sre on IRC, or file a
Phabricator
ticket with the *datacenter-switchover* tag (pre-filled form here <
https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datac...
);
we'll be monitoring closely for reports of trouble during and after the switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover and its preparation are tracked tracked in Phabricator Task T327920 https://phabricator.wikimedia.org/T327920
On behalf of the SRE team, please excuse the disruption, and our thanks to everyone in a number of departments who've been involved in planning this work for the past weeks. Feel free to reply directly to me with
any
questions.
Thank you,
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Specifically:
if we measure read only time as "an editor can't start an edit because wikis are read only", then the read-only time is 119s; if we measure it by the last timestamp of an edit being saved, that's 94 seconds.
As Amir explained, we leave some room for propagation of the MediaWiki read-only mode (about 10-15 seconds) and for in-flight edits (another 10 seconds) before we set the databases to read-only as well.
I think 2 minutes of read-only for such a complex operation are the good balance between reasonable change safety and reduction of impact; we could reduce the read-only time by another 10-20 seconds with some more aggressive moves (like clearing the DNS recursor caches) but I don't think there's a big value there at this point.
I'll add: if anyone is interested in knowing more and they're coming to the hackathon, I'll be happy to make an impromptu session about how we handle this procedure.
Cheers,
Giuseppe
On Wed, Mar 1, 2023 at 6:16 PM Amir Sarabadani ladsgroup@gmail.com wrote:
It's a bit complicated. When SRE sets the read-only mark, they start counting from that time and it starts propagating which takes a while to be actually shown to all users but some users might still see the RO error while some actual writes are happening somewhere else because the cache is not invalidated yet (I think it has a TTL of 5 seconds but I need to double check). We still consider that as RO time because it's affecting users regardless.
HTH
Am Mi., 1. März 2023 um 18:06 Uhr schrieb Dušan Kreheľ < dusankrehel@gmail.com>:
Clément Goubert and everybody,
I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i have the another results.
Last change (before migration): 2023-03-01T14:00:30 First change (after migration): 2023-03-01T14:02:05 Result: Down time (14:00:31 to 14:02:05) is 94s.
I think that analysis is more authoritative. I think it analyzes based on something like REQUEST_TIME in PHP.
Dušan Kreheľ
2023-03-01 16:30 GMT+01:00, Clément Goubert cgoubert@wikimedia.org:
Dear Wikitechians,
Dear colleagues,
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday March 1st*, and lasted *119 seconds*. All our public and private wikis
continued
to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again
in a
few minutes.
As a side note, with other SREs we have been trying to discern the
effect
of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org during Switchover. In
many,
it's impossible to tell the event. The most discernible graph we have
is of
the edit rate, which can be viewed here: Grafana <
https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&a...
. Can you spot it? See the attached picture to help:
I am extending thanks to everyone that was also present on IRC, helping
out
in any way that they could. Thanks as well to Community Relations who notified communities of the read-only window ahead of time. And thanks
to
everyone that contributed to MultiDC https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki, especially Performance for pushing forward with the last parts of it, allowing us to perform this Switchover faster and with more confidence
than
ever before.
If you wanna relive through the Switchover, here's a link to a capture of Listen to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia
during the
Switchover: Listen to the Switchover <
https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=s...
(spoiler: the part with no sounds is the switchover)
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*. Thank you,
On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert <cgoubert@wikimedia.org
wrote:
Dear Wikitechians,
I would like to remind you that the datacenter switchover will happen
on
*Wednesday March 1st* starting at *14:00 UTC.*
Please refer to the original email for any additional information. As always, you can reach out to me directly or the SRE team in #wikimedia-sre on IRC with any question, or through Phabricator.
Thank you,
On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert <
cgoubert@wikimedia.org>
wrote:
Dear Wikitechians,
On *Wednesday March 1st*, the SRE team will run a planned data center switchover, moving all wikis from our primary data center in Virginia
to
the secondary data center in Texas. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It
also
gives all our SRE and ops teams a chance to do maintenance and
upgrades
on systems in Virginia that normally run 24 hours a day.
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday March 1st*, and will last for a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading as usual, but no one will be
able
to save edits during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.
CommRel has already begun notifying communities of the read-only
window.
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*.
If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC (instructions for joining here https://meta.wikimedia.org/wiki/IRC/Instructions). To report any issues, you can reach us in #wikimedia-sre on IRC, or file a
Phabricator
ticket with the *datacenter-switchover* tag (pre-filled form here <
https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datac...
);
we'll be monitoring closely for reports of trouble during and after
the
switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover and its preparation are tracked tracked in Phabricator Task T327920 https://phabricator.wikimedia.org/T327920
On behalf of the SRE team, please excuse the disruption, and our
thanks
to everyone in a number of departments who've been involved in
planning
this work for the past weeks. Feel free to reply directly to me with
any
questions.
Thank you,
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Hello. Is the new files bug happening because of the database switch? And if it is, do you fix it? The story: If you reupload file, local or on commons, the new version does not create thumb, and the old one is shown in articles. The only way to see a new version is opening the file in media: namespace. I waited for hours. So, you can't edit files any more. I read the commons reupload log a bit, looks like it's not just my problem. The regular things, clear cache, purge, null edit, do not help. I think there should be a phab task created, but I'd like to know your answer first. Thank you. Igal (user:IKhitron)
בתאריך יום ד׳, 1 במרץ 2023, 19:26, מאת Giuseppe Lavagetto < glavagetto@wikimedia.org>:
Specifically:
if we measure read only time as "an editor can't start an edit because wikis are read only", then the read-only time is 119s; if we measure it by the last timestamp of an edit being saved, that's 94 seconds.
As Amir explained, we leave some room for propagation of the MediaWiki read-only mode (about 10-15 seconds) and for in-flight edits (another 10 seconds) before we set the databases to read-only as well.
I think 2 minutes of read-only for such a complex operation are the good balance between reasonable change safety and reduction of impact; we could reduce the read-only time by another 10-20 seconds with some more aggressive moves (like clearing the DNS recursor caches) but I don't think there's a big value there at this point.
I'll add: if anyone is interested in knowing more and they're coming to the hackathon, I'll be happy to make an impromptu session about how we handle this procedure.
Cheers,
Giuseppe
On Wed, Mar 1, 2023 at 6:16 PM Amir Sarabadani ladsgroup@gmail.com wrote:
It's a bit complicated. When SRE sets the read-only mark, they start counting from that time and it starts propagating which takes a while to be actually shown to all users but some users might still see the RO error while some actual writes are happening somewhere else because the cache is not invalidated yet (I think it has a TTL of 5 seconds but I need to double check). We still consider that as RO time because it's affecting users regardless.
HTH
Am Mi., 1. März 2023 um 18:06 Uhr schrieb Dušan Kreheľ < dusankrehel@gmail.com>:
Clément Goubert and everybody,
I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i have the another results.
Last change (before migration): 2023-03-01T14:00:30 First change (after migration): 2023-03-01T14:02:05 Result: Down time (14:00:31 to 14:02:05) is 94s.
I think that analysis is more authoritative. I think it analyzes based on something like REQUEST_TIME in PHP.
Dušan Kreheľ
2023-03-01 16:30 GMT+01:00, Clément Goubert cgoubert@wikimedia.org:
Dear Wikitechians,
Dear colleagues,
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday
March
1st*, and lasted *119 seconds*. All our public and private wikis
continued
to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again
in a
few minutes.
As a side note, with other SREs we have been trying to discern the
effect
of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org during Switchover. In
many,
it's impossible to tell the event. The most discernible graph we have
is of
the edit rate, which can be viewed here: Grafana <
https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&a...
. Can you spot it? See the attached picture to help:
I am extending thanks to everyone that was also present on IRC,
helping out
in any way that they could. Thanks as well to Community Relations who notified communities of the read-only window ahead of time. And thanks
to
everyone that contributed to MultiDC https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki, especially Performance for pushing forward with the last parts of it, allowing us to perform this Switchover faster and with more confidence
than
ever before.
If you wanna relive through the Switchover, here's a link to a capture of Listen to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia
during the
Switchover: Listen to the Switchover <
https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=s...
(spoiler: the part with no sounds is the switchover)
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*. Thank you,
On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert <
cgoubert@wikimedia.org>
wrote:
Dear Wikitechians,
I would like to remind you that the datacenter switchover will happen
on
*Wednesday March 1st* starting at *14:00 UTC.*
Please refer to the original email for any additional information. As always, you can reach out to me directly or the SRE team in #wikimedia-sre on IRC with any question, or through Phabricator.
Thank you,
On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert <
cgoubert@wikimedia.org>
wrote:
Dear Wikitechians,
On *Wednesday March 1st*, the SRE team will run a planned data center switchover, moving all wikis from our primary data center in
Virginia to
the secondary data center in Texas. This is an important periodic
test
of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It
also
gives all our SRE and ops teams a chance to do maintenance and
upgrades
on systems in Virginia that normally run 24 hours a day.
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday March 1st*, and will last for a few minutes while we execute the migration as efficiently as possible. All our public and private
wikis
will be continuously available for reading as usual, but no one will be
able
to save edits during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.
CommRel has already begun notifying communities of the read-only
window.
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*.
If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC (instructions for joining here https://meta.wikimedia.org/wiki/IRC/Instructions). To report any issues, you can reach us in #wikimedia-sre on IRC, or file a
Phabricator
ticket with the *datacenter-switchover* tag (pre-filled form here <
https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datac...
);
we'll be monitoring closely for reports of trouble during and after
the
switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover and its preparation are tracked tracked in Phabricator Task T327920 https://phabricator.wikimedia.org/T327920
On behalf of the SRE team, please excuse the disruption, and our
thanks
to everyone in a number of departments who've been involved in
planning
this work for the past weeks. Feel free to reply directly to me with
any
questions.
Thank you,
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
You should probably just file a bug.
Its certainly plausible it had something to do with data center switch, but it could just as equally be unrelated. It requires someone to investigate what part of the system failed (job queue? Varnish? Swift?) which would then lead to a root cause. Its pretty impossible to say without further investigation, and speculating on list is probably not helpful.
-- Bawolff
On Wednesday, March 1, 2023, Igal Khitron khitron@gmail.com wrote:
Hello. Is the new files bug happening because of the database switch? And if it is, do you fix it? The story: If you reupload file, local or on commons, the new version does not create thumb, and the old one is shown in articles. The only way to see a new version is opening the file in media: namespace. I waited for hours. So, you can't edit files any more. I read the commons reupload log a bit, looks like it's not just my problem. The regular things, clear cache, purge, null edit, do not help. I think there should be a phab task created, but I'd like to know your answer first. Thank you. Igal (user:IKhitron)
בתאריך יום ד׳, 1 במרץ 2023, 19:26, מאת Giuseppe Lavagetto < glavagetto@wikimedia.org>:
Specifically:
if we measure read only time as "an editor can't start an edit because wikis are read only", then the read-only time is 119s; if we measure it by the last timestamp of an edit being saved, that's 94 seconds.
As Amir explained, we leave some room for propagation of the MediaWiki read-only mode (about 10-15 seconds) and for in-flight edits (another 10 seconds) before we set the databases to read-only as well.
I think 2 minutes of read-only for such a complex operation are the good balance between reasonable change safety and reduction of impact; we could reduce the read-only time by another 10-20 seconds with some more aggressive moves (like clearing the DNS recursor caches) but I don't think there's a big value there at this point.
I'll add: if anyone is interested in knowing more and they're coming to the hackathon, I'll be happy to make an impromptu session about how we handle this procedure.
Cheers,
Giuseppe
On Wed, Mar 1, 2023 at 6:16 PM Amir Sarabadani ladsgroup@gmail.com wrote:
It's a bit complicated. When SRE sets the read-only mark, they start counting from that time and it starts propagating which takes a while to be actually shown to all users but some users might still see the RO error while some actual writes are happening somewhere else because the cache is not invalidated yet (I think it has a TTL of 5 seconds but I need to double check). We still consider that as RO time because it's affecting users regardless.
HTH
Am Mi., 1. März 2023 um 18:06 Uhr schrieb Dušan Kreheľ < dusankrehel@gmail.com>:
Clément Goubert and everybody,
I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i have the another results.
Last change (before migration): 2023-03-01T14:00:30 First change (after migration): 2023-03-01T14:02:05 Result: Down time (14:00:31 to 14:02:05) is 94s.
I think that analysis is more authoritative. I think it analyzes based on something like REQUEST_TIME in PHP.
Dušan Kreheľ
2023-03-01 16:30 GMT+01:00, Clément Goubert cgoubert@wikimedia.org:
Dear Wikitechians,
Dear colleagues,
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday
March
1st*, and lasted *119 seconds*. All our public and private wikis
continued
to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again
in a
few minutes.
As a side note, with other SREs we have been trying to discern the
effect
of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org during Switchover.
In many,
it's impossible to tell the event. The most discernible graph we have
is of
the edit rate, which can be viewed here: Grafana <https://grafana-rw.wikimedia.org/d/000000208/edit-count?
from=1677673800000&orgId=1&to=1677681000000>.
Can you spot it? See the attached picture to help:
I am extending thanks to everyone that was also present on IRC,
helping out
in any way that they could. Thanks as well to Community Relations who notified communities of the read-only window ahead of time. And
thanks to
everyone that contributed to MultiDC https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki, especially Performance for pushing forward with the last parts of it, allowing us to perform this Switchover faster and with more
confidence than
ever before.
If you wanna relive through the Switchover, here's a link to a capture of Listen to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia
during the
Switchover: Listen to the Switchover <https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1
Nl_/view?usp=share_link>
(spoiler: the part with no sounds is the switchover)
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*. Thank you,
On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert <
cgoubert@wikimedia.org>
wrote:
Dear Wikitechians,
I would like to remind you that the datacenter switchover will
happen on
*Wednesday March 1st* starting at *14:00 UTC.*
Please refer to the original email for any additional information. As always, you can reach out to me directly or the SRE team in #wikimedia-sre on IRC with any question, or through Phabricator.
Thank you,
On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert <
cgoubert@wikimedia.org>
wrote:
> Dear Wikitechians, > > On *Wednesday March 1st*, the SRE team will run a planned data
center
> switchover, moving all wikis from our primary data center in
Virginia to
> the secondary data center in Texas. This is an important periodic
test
> of > our tools and procedures, to ensure the wikis will continue to be > available > even in the event of major technical issues in our primary home. It
also
> gives all our SRE and ops teams a chance to do maintenance and
upgrades
> on > systems in Virginia that normally run 24 hours a day. > > The switchover process requires a *brief read-only period for all > Foundation-hosted wikis*, which will start at *14:00 UTC on
Wednesday
> March 1st*, and will last for a few minutes while we execute the > migration as efficiently as possible. All our public and private
wikis
> will > be continuously available for reading as usual, but no one will be
able
> to > save edits during the process. Users will see a notification of the > upcoming maintenance, and anyone still editing will be asked to try > again > in a few minutes. > > CommRel has already begun notifying communities of the read-only
window.
> A similar event will follow a few weeks later, when we move back to > Virginia. This is currently scheduled for *Wednesday, April 26th*. > > If you like, you can follow along on the day in the public > #wikimedia-operations channel on IRC (instructions for joining here > https://meta.wikimedia.org/wiki/IRC/Instructions). To report any > issues, you can reach us in #wikimedia-sre on IRC, or file a
Phabricator
> ticket with the *datacenter-switchover* tag (pre-filled form here > <https://phabricator.wikimedia.org/maniphest/task/
edit/form/1/?projects=Datacenter-Switchover&subscribers=Clement_Goubert
);
> we'll be monitoring closely for reports of trouble during and after
the
> switchover. (If you're new to Phab, there's more information at > Phabricator/Help.) The switchover and its preparation are tracked > tracked in Phabricator Task T327920 > https://phabricator.wikimedia.org/T327920 > > On behalf of the SRE team, please excuse the disruption, and our
thanks
> to everyone in a number of departments who've been involved in
planning
> this work for the past weeks. Feel free to reply directly to me
with any
> questions. > > Thank you, > > -- > Clément Goubert (they/them) > Senior SRE > Wikimedia Foundation >
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l. lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l. lists.wikimedia.org/
-- Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l. lists.wikimedia.org/
I see. Very well, I'll open a task. Thank you. Igal
בתאריך יום ד׳, 1 במרץ 2023, 22:07, מאת Brian Wolff bawolff@gmail.com:
You should probably just file a bug.
Its certainly plausible it had something to do with data center switch, but it could just as equally be unrelated. It requires someone to investigate what part of the system failed (job queue? Varnish? Swift?) which would then lead to a root cause. Its pretty impossible to say without further investigation, and speculating on list is probably not helpful.
-- Bawolff
On Wednesday, March 1, 2023, Igal Khitron khitron@gmail.com wrote:
Hello. Is the new files bug happening because of the database switch? And if it is, do you fix it? The story: If you reupload file, local or on commons, the new version does not create thumb, and the old one is shown in articles. The only way to see a new version is opening the file in media: namespace. I waited for hours. So, you can't edit files any more. I read the commons reupload log a bit, looks like it's not just my problem. The regular things, clear cache, purge, null edit, do not help. I think there should be a phab task created, but I'd like to know your answer first. Thank you. Igal (user:IKhitron)
בתאריך יום ד׳, 1 במרץ 2023, 19:26, מאת Giuseppe Lavagetto < glavagetto@wikimedia.org>:
Specifically:
if we measure read only time as "an editor can't start an edit because wikis are read only", then the read-only time is 119s; if we measure it by the last timestamp of an edit being saved, that's 94 seconds.
As Amir explained, we leave some room for propagation of the MediaWiki read-only mode (about 10-15 seconds) and for in-flight edits (another 10 seconds) before we set the databases to read-only as well.
I think 2 minutes of read-only for such a complex operation are the good balance between reasonable change safety and reduction of impact; we could reduce the read-only time by another 10-20 seconds with some more aggressive moves (like clearing the DNS recursor caches) but I don't think there's a big value there at this point.
I'll add: if anyone is interested in knowing more and they're coming to the hackathon, I'll be happy to make an impromptu session about how we handle this procedure.
Cheers,
Giuseppe
On Wed, Mar 1, 2023 at 6:16 PM Amir Sarabadani ladsgroup@gmail.com wrote:
It's a bit complicated. When SRE sets the read-only mark, they start counting from that time and it starts propagating which takes a while to be actually shown to all users but some users might still see the RO error while some actual writes are happening somewhere else because the cache is not invalidated yet (I think it has a TTL of 5 seconds but I need to double check). We still consider that as RO time because it's affecting users regardless.
HTH
Am Mi., 1. März 2023 um 18:06 Uhr schrieb Dušan Kreheľ < dusankrehel@gmail.com>:
Clément Goubert and everybody,
I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i have the another results.
Last change (before migration): 2023-03-01T14:00:30 First change (after migration): 2023-03-01T14:02:05 Result: Down time (14:00:31 to 14:02:05) is 94s.
I think that analysis is more authoritative. I think it analyzes based on something like REQUEST_TIME in PHP.
Dušan Kreheľ
2023-03-01 16:30 GMT+01:00, Clément Goubert cgoubert@wikimedia.org:
Dear Wikitechians,
Dear colleagues,
The switchover process requires a *brief read-only period for all Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday
March
1st*, and lasted *119 seconds*. All our public and private wikis
continued
to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try
again in a
few minutes.
As a side note, with other SREs we have been trying to discern the
effect
of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org during Switchover.
In many,
it's impossible to tell the event. The most discernible graph we
have is of
the edit rate, which can be viewed here: Grafana <
https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&a...
. Can you spot it? See the attached picture to help:
I am extending thanks to everyone that was also present on IRC,
helping out
in any way that they could. Thanks as well to Community Relations who notified communities of the read-only window ahead of time. And
thanks to
everyone that contributed to MultiDC <https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki , especially Performance for pushing forward with the last parts of it, allowing us to perform this Switchover faster and with more
confidence than
ever before.
If you wanna relive through the Switchover, here's a link to a
capture
of Listen to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia
during the
Switchover: Listen to the Switchover <
https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=s...
(spoiler: the part with no sounds is the switchover)
A similar event will follow a few weeks later, when we move back to Virginia. This is currently scheduled for *Wednesday, April 26th*. Thank you,
On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert <
cgoubert@wikimedia.org>
wrote:
> Dear Wikitechians, > > I would like to remind you that the datacenter switchover will
happen on
> *Wednesday > March 1st* starting at *14:00 UTC.* > > Please refer to the original email for any additional information.
As
> always, you can reach out to me directly or the SRE team in > #wikimedia-sre > on IRC with any question, or through Phabricator. > > Thank you, > > On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert <
cgoubert@wikimedia.org>
> wrote: > >> Dear Wikitechians, >> >> On *Wednesday March 1st*, the SRE team will run a planned data
center
>> switchover, moving all wikis from our primary data center in
Virginia to
>> the secondary data center in Texas. This is an important periodic
test
>> of >> our tools and procedures, to ensure the wikis will continue to be >> available >> even in the event of major technical issues in our primary home.
It also
>> gives all our SRE and ops teams a chance to do maintenance and
upgrades
>> on >> systems in Virginia that normally run 24 hours a day. >> >> The switchover process requires a *brief read-only period for all >> Foundation-hosted wikis*, which will start at *14:00 UTC on
Wednesday
>> March 1st*, and will last for a few minutes while we execute the >> migration as efficiently as possible. All our public and private
wikis
>> will >> be continuously available for reading as usual, but no one will be
able
>> to >> save edits during the process. Users will see a notification of the >> upcoming maintenance, and anyone still editing will be asked to try >> again >> in a few minutes. >> >> CommRel has already begun notifying communities of the read-only
window.
>> A similar event will follow a few weeks later, when we move back to >> Virginia. This is currently scheduled for *Wednesday, April 26th*. >> >> If you like, you can follow along on the day in the public >> #wikimedia-operations channel on IRC (instructions for joining here >> https://meta.wikimedia.org/wiki/IRC/Instructions). To report any >> issues, you can reach us in #wikimedia-sre on IRC, or file a
Phabricator
>> ticket with the *datacenter-switchover* tag (pre-filled form here >> <
https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datac...
); >> we'll be monitoring closely for reports of trouble during and
after the
>> switchover. (If you're new to Phab, there's more information at >> Phabricator/Help.) The switchover and its preparation are tracked >> tracked in Phabricator Task T327920 >> https://phabricator.wikimedia.org/T327920 >> >> On behalf of the SRE team, please excuse the disruption, and our
thanks
>> to everyone in a number of departments who've been involved in
planning
>> this work for the past weeks. Feel free to reply directly to me
with any
>> questions. >> >> Thank you, >> >> -- >> Clément Goubert (they/them) >> Senior SRE >> Wikimedia Foundation >> > > > -- > Clément Goubert (they/them) > Senior SRE > Wikimedia Foundation >
-- Clément Goubert (they/them) Senior SRE Wikimedia Foundation
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Hello everyone,
Just as Clément wrote in the first message, on *Wednesday April 26*, there will be a switchover from our secondary data center in Texas back to the primary data center in Virginia. That day, starting at *14:00 UTC*, there will be a brief (just a few minutes) read-only period for all Foundation-hosted wikis.
We have sent a message to all the wikis; there will also be a banner informing about the maintenance, displayed 30 minutes before this operation happens.
If you're interested in the details about the switchover, scroll up to Clément's message :)
Thanks,
Szymon Grabarczuk (he/him)
Senior Community Relations Specialist
Wikimedia Foundation https://wikimediafoundation.org/
On Wed, Mar 1, 2023 at 9:11 PM Igal Khitron khitron@gmail.com wrote:
I see. Very well, I'll open a task. Thank you. Igal
בתאריך יום ד׳, 1 במרץ 2023, 22:07, מאת Brian Wolff bawolff@gmail.com:
You should probably just file a bug.
Its certainly plausible it had something to do with data center switch, but it could just as equally be unrelated. It requires someone to investigate what part of the system failed (job queue? Varnish? Swift?) which would then lead to a root cause. Its pretty impossible to say without further investigation, and speculating on list is probably not helpful.
-- Bawolff
On Wednesday, March 1, 2023, Igal Khitron khitron@gmail.com wrote:
Hello. Is the new files bug happening because of the database switch? And if it is, do you fix it? The story: If you reupload file, local or on commons, the new version does not create thumb, and the old one is shown in articles. The only way to see a new version is opening the file in media: namespace. I waited for hours. So, you can't edit files any more. I read the commons reupload log a bit, looks like it's not just my problem. The regular things, clear cache, purge, null edit, do not help. I think there should be a phab task created, but I'd like to know your answer first. Thank you. Igal (user:IKhitron)
בתאריך יום ד׳, 1 במרץ 2023, 19:26, מאת Giuseppe Lavagetto < glavagetto@wikimedia.org>:
Specifically:
if we measure read only time as "an editor can't start an edit because wikis are read only", then the read-only time is 119s; if we measure it by the last timestamp of an edit being saved, that's 94 seconds.
As Amir explained, we leave some room for propagation of the MediaWiki read-only mode (about 10-15 seconds) and for in-flight edits (another 10 seconds) before we set the databases to read-only as well.
I think 2 minutes of read-only for such a complex operation are the good balance between reasonable change safety and reduction of impact; we could reduce the read-only time by another 10-20 seconds with some more aggressive moves (like clearing the DNS recursor caches) but I don't think there's a big value there at this point.
I'll add: if anyone is interested in knowing more and they're coming to the hackathon, I'll be happy to make an impromptu session about how we handle this procedure.
Cheers,
Giuseppe
On Wed, Mar 1, 2023 at 6:16 PM Amir Sarabadani ladsgroup@gmail.com wrote:
It's a bit complicated. When SRE sets the read-only mark, they start counting from that time and it starts propagating which takes a while to be actually shown to all users but some users might still see the RO error while some actual writes are happening somewhere else because the cache is not invalidated yet (I think it has a TTL of 5 seconds but I need to double check). We still consider that as RO time because it's affecting users regardless.
HTH
Am Mi., 1. März 2023 um 18:06 Uhr schrieb Dušan Kreheľ < dusankrehel@gmail.com>:
Clément Goubert and everybody,
I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i have the another results.
Last change (before migration): 2023-03-01T14:00:30 First change (after migration): 2023-03-01T14:02:05 Result: Down time (14:00:31 to 14:02:05) is 94s.
I think that analysis is more authoritative. I think it analyzes based on something like REQUEST_TIME in PHP.
Dušan Kreheľ
2023-03-01 16:30 GMT+01:00, Clément Goubert cgoubert@wikimedia.org: > Dear Wikitechians, > > Dear colleagues, > > The switchover process requires a *brief read-only period for all > Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday March > 1st*, and lasted *119 seconds*. All our public and private wikis continued > to be available for reading as usual. Users saw a notification of the > upcoming maintenance, and anyone still editing was asked to try again in a > few minutes. > > As a side note, with other SREs we have been trying to discern the effect > of the Switchover in many of the graphs we have to monitor the > infrastructure in https://grafana.wikimedia.org during Switchover. In many, > it's impossible to tell the event. The most discernible graph we have is of > the edit rate, which can be viewed here: Grafana > < https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&a... >. > Can you spot it? See the attached picture to help: > > I am extending thanks to everyone that was also present on IRC, helping out > in any way that they could. Thanks as well to Community Relations who > notified communities of the read-only window ahead of time. And thanks to > everyone that contributed to MultiDC > https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki , > especially Performance for pushing forward with the last parts of it, > allowing us to perform this Switchover faster and with more confidence than > ever before. > > If you wanna relive through the Switchover, here's a link to a capture > of Listen > to Wikipedia https://en.wikipedia.org/wiki/Listen_to_Wikipedia during the > Switchover: Listen to the Switchover > < https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=s... > > (spoiler: > the part with no sounds is the switchover) > > A similar event will follow a few weeks later, when we move back to > Virginia. This is currently scheduled for *Wednesday, April 26th*. > Thank you, > > On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert < cgoubert@wikimedia.org> > wrote: > >> Dear Wikitechians, >> >> I would like to remind you that the datacenter switchover will happen on >> *Wednesday >> March 1st* starting at *14:00 UTC.* >> >> Please refer to the original email for any additional information. As >> always, you can reach out to me directly or the SRE team in >> #wikimedia-sre >> on IRC with any question, or through Phabricator. >> >> Thank you, >> >> On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert < cgoubert@wikimedia.org> >> wrote: >> >>> Dear Wikitechians, >>> >>> On *Wednesday March 1st*, the SRE team will run a planned data center >>> switchover, moving all wikis from our primary data center in Virginia to >>> the secondary data center in Texas. This is an important periodic test >>> of >>> our tools and procedures, to ensure the wikis will continue to be >>> available >>> even in the event of major technical issues in our primary home. It also >>> gives all our SRE and ops teams a chance to do maintenance and upgrades >>> on >>> systems in Virginia that normally run 24 hours a day. >>> >>> The switchover process requires a *brief read-only period for all >>> Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday >>> March 1st*, and will last for a few minutes while we execute the >>> migration as efficiently as possible. All our public and private wikis >>> will >>> be continuously available for reading as usual, but no one will be able >>> to >>> save edits during the process. Users will see a notification of the >>> upcoming maintenance, and anyone still editing will be asked to try >>> again >>> in a few minutes. >>> >>> CommRel has already begun notifying communities of the read-only window. >>> A similar event will follow a few weeks later, when we move back to >>> Virginia. This is currently scheduled for *Wednesday, April 26th*. >>> >>> If you like, you can follow along on the day in the public >>> #wikimedia-operations channel on IRC (instructions for joining here >>> https://meta.wikimedia.org/wiki/IRC/Instructions). To report any >>> issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator >>> ticket with the *datacenter-switchover* tag (pre-filled form here >>> < https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datac... >); >>> we'll be monitoring closely for reports of trouble during and after the >>> switchover. (If you're new to Phab, there's more information at >>> Phabricator/Help.) The switchover and its preparation are tracked >>> tracked in Phabricator Task T327920 >>> https://phabricator.wikimedia.org/T327920 >>> >>> On behalf of the SRE team, please excuse the disruption, and our thanks >>> to everyone in a number of departments who've been involved in planning >>> this work for the past weeks. Feel free to reply directly to me with any >>> questions. >>> >>> Thank you, >>> >>> -- >>> Clément Goubert (they/them) >>> Senior SRE >>> Wikimedia Foundation >>> >> >> >> -- >> Clément Goubert (they/them) >> Senior SRE >> Wikimedia Foundation >> > > > -- > Clément Goubert (they/them) > Senior SRE > Wikimedia Foundation > _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
wikitech-l@lists.wikimedia.org