Hello everyone,
TL;DR Toolhub will have a few hours of downtime due to maintenance on Tuesday 2023-03-07 Furthermore, if you are not deploying services to the eqiad wikikube kubernetes cluster, you can safely skip the rest.
Long version:
We will reinitialize the eqiad wikikube kubernetes cluster using kubernetes version 1.23 on 2023-03-07 09:00-16:00 UTC [1] (the actual process is expected take a couple of hours within this window). The date was chosen for convenience as due to the data center switchover process, eqiad is fully depooled, receiving almost 0 traffic. This is scheduled to change on 2023-03-08, making the process more difficult. As all traffic has been drained already and we expect no visible impact. However, for the duration of the process, the kubernetes cluster will be unavailable to deployers and thus efforts to deploy to it will fail or worse, not have the expected outcomes. This is normal until SRE serviceops announces that the cluster is fully operational again.
SRE serviceops will be deploying all services before marking the cluster as usable so there will be no need for deployers to re-deploy their services (apart from those already informed).
Toolhub, per https://phabricator.wikimedia.org/T329319 wasn't switched over to codfw and is still being served from wikikube eqiad. Unavoidably, it will suffer a small downtime of a few hours. That is known and expected. To minimize that downtime, it will be prioritized during the initialization phase.
[1] https://phabricator.wikimedia.org/T331126
Hello everyone,
Upgrade, done. Cluster has been successfully upgraded to 1.23 and applications have just been redeployed. toolhub is operational again.
On Fri, Mar 3, 2023 at 3:45 PM Alexandros Kosiaris akosiaris@wikimedia.org wrote:
Hello everyone,
TL;DR Toolhub will have a few hours of downtime due to maintenance on Tuesday 2023-03-07 Furthermore, if you are not deploying services to the eqiad wikikube kubernetes cluster, you can safely skip the rest.
Long version:
We will reinitialize the eqiad wikikube kubernetes cluster using kubernetes version 1.23 on 2023-03-07 09:00-16:00 UTC [1] (the actual process is expected take a couple of hours within this window). The date was chosen for convenience as due to the data center switchover process, eqiad is fully depooled, receiving almost 0 traffic. This is scheduled to change on 2023-03-08, making the process more difficult. As all traffic has been drained already and we expect no visible impact. However, for the duration of the process, the kubernetes cluster will be unavailable to deployers and thus efforts to deploy to it will fail or worse, not have the expected outcomes. This is normal until SRE serviceops announces that the cluster is fully operational again.
SRE serviceops will be deploying all services before marking the cluster as usable so there will be no need for deployers to re-deploy their services (apart from those already informed).
Toolhub, per https://phabricator.wikimedia.org/T329319 wasn't switched over to codfw and is still being served from wikikube eqiad. Unavoidably, it will suffer a small downtime of a few hours. That is known and expected. To minimize that downtime, it will be prioritized during the initialization phase.
[1] https://phabricator.wikimedia.org/T331126
-- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation
wikitech-l@lists.wikimedia.org