Just wanted to emphasize that this is a great effort, and a huge step towards improving
the current reliability of our
We should do more of this, broader and more exhaustive.
On 06/28 12:33, Kunal Mehta wrote:
Today we switched over most services and traffic caches from the eqiad
(Virginia) datacenter to codfw (Texas) as part of improving our reliability.
The goal is to have this procedure working and regularly tested in case of
an emergency when we actually need it.
We're only aware of one user-facing impact, for a short time WDQS lag
detection was broken, affecting Wikidata bots that check it. This is tracked
Users will experience a bit of a latency increase for now as most user
traffic will need to talk to both eqiad and codfw datacenters. This will go
away tomorrow once MediaWiki is switched over (keep reading).
Also, we were a bit delayed in starting today because of an issue causing
appservers to get stuck: <https://phabricator.wikimedia.org/T285634>.
== Services ==
Started at 14:29 UTC, officially finished at 15:09.
The main issues we ran into were:
* the helm-charts service is unique and doesn't have a service IP, causing
the automatic switchover verification to break. This required us to manually
check the other services that come after it in the list, and then re-run
cookbook while excluding it. Tracked as
* the restbase-async service has some special handling, which we debated on
whether to follow that or not, opted to not special case it. Figuring out
what to do long-term is <https://phabricator.wikimedia.org/T285711>.
* the WDQS issue mentioned earlier.
== Traffic ==
Started at 15:43, finished at 15:45.
It took until ~16:25 for eqiad to mostly depool. There's not much else to
report, it went very smoothly.
== Tomorrow's MediaWiki switchover ==
Scheduled for 14:00 UTC <https://zonestamp.toolforge.org/1624888854>.
It is our goal to minimize the read-only time and make this a non-event from
a user perspective.
All of the coordination will take place in the #wikimedia-operations IRC
channel on Libera Chat You're more than welcome to follow along but if you
have questions, please ask them in #wikimedia-tech so it doesn't get
disruptive. The procedure that we'll be following is documented at
I'm planning to do one more "live test" later today, will announce that on
IRC when it gets started.
Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."