New subject: [Ops] codfw kubernetes cluster upgrade this week

16 Mar 2021

Hello everyone,

TL;DR if you are not deploying services to the codfw kubernetes
cluster, you can safely skip this.

Long version:

After having tested twice our cluster reinitialization procedure, this
week we will be reinitializing our codfw kubernetes cluster. All
traffic will be drained from it beforehand and we expect no user
visible impact. However, for the duration of the process, the
kubernetes codfw cluster will be unavailable to deployers and thus
efforts to deploy to it will fail or worse, not have the expected
outcomes. This is normal until SRE serviceops announces that the
cluster is fully operational again.

SRE service-ops will be deploying all services before marking the
cluster as usable and pooling traffic back to it, so there will be no
need for deployers to re-deploy their services.

For your convenience the list of services that are currently deployed
on that cluster is: apertium api-gateway blubberoid changeprop
changeprop-jobqueue citoid cxserver echostore eventgate-analytics
eventgate-analytics-external eventgate-logging-external eventgate-main
eventstreams eventstreams-internal linkrecommendation mathoid
mobileapps proton push-notifications recommendation-api sessionstore
similar-users termbox wikifeeds zotero

Regards,

-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation