Hello everyone,
Please join us in celebrating a very successful Datacenter Switchover. This switch to our data center in Texas was run by Scott French, the newest addition to the SRE Service Operations team. This instance of the Switchover continues the tradition of successful switchovers and was completed without a hitch with a read only period of 2 minutes 46 seconds
For context, the Site Reliability Team (SRE) runs a planned data center switchover periodically, moving all wikis from our primary data center in (for this instance, Virginia) to the secondary data center (for this instance, Texas). This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems that normally run 24 hours a day.
The switchover process requires a brief read-only period for all Foundation-hosted wikis, which started at 14:58 UTC on Wednesday September 25th, lasting 2 minutes and 46 seconds. All our public and private wikis continued to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again in a few minutes.
As with the previous Switchover, I 've been trying to discern the effect of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org/. In most, it's impossible to spot the event. We consider this very nice and attribute it to various improvements done throughout the years from many teams, in and outside SRE.
This switchover is our first where external and internal traffic flows exclusively to MediaWiki on Kubernetes, a fact that makes me personally pretty happy.
As per our newer process, we no longer have a Switchback. We will be staying in Virginia as our primary data center for the next 6 months, switching back to Virginia on Wednesday, March 19. Per the same process, we 'll also be in Single DC for the next week, going back to MultiDC on Wednesday October 2nd.
As always, my deepest thanks to all people that have helped with this, in one way or another, ranging from the person running point, to all SREs and developers/deployers participating or having contributed, to people in Movement Communications for helping with the messaging.
To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the datacenter-switchover tag; we'll be monitoring closely for reports of trouble (If you're new to Phab, there's more information at Phabricator/Help.) The switchover, preparation as well as followup actions are tracked in Phabricator Task T370962
wikitech-l@lists.wikimedia.org