Hi,
This has happened now. Out of an abundance of caution, the cluster
isn't going to be repooled right now, but rather tomorrow EU morning,
but it's otherwise fully operational. Deploys will be fully functional
again, so if anything breaks, please let us know in phabricator.
Related task is:
https://phabricator.wikimedia.org/T277191 is you care
to follow up the last few steps.
On Tue, Mar 16, 2021 at 10:31 AM Alexandros Kosiaris
<akosiaris(a)wikimedia.org> wrote:
Hello everyone,
TL;DR if you are not deploying services to the codfw kubernetes
cluster, you can safely skip this.
Long version:
After having tested twice our cluster reinitialization procedure, this
week we will be reinitializing our codfw kubernetes cluster. All
traffic will be drained from it beforehand and we expect no user
visible impact. However, for the duration of the process, the
kubernetes codfw cluster will be unavailable to deployers and thus
efforts to deploy to it will fail or worse, not have the expected
outcomes. This is normal until SRE serviceops announces that the
cluster is fully operational again.
SRE service-ops will be deploying all services before marking the
cluster as usable and pooling traffic back to it, so there will be no
need for deployers to re-deploy their services.
For your convenience the list of services that are currently deployed
on that cluster is: apertium api-gateway blubberoid changeprop
changeprop-jobqueue citoid cxserver echostore eventgate-analytics
eventgate-analytics-external eventgate-logging-external eventgate-main
eventstreams eventstreams-internal linkrecommendation mathoid
mobileapps proton push-notifications recommendation-api sessionstore
similar-users termbox wikifeeds zotero
Regards,
--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation