Debian Stretch's security support ends in mid 2022, and the Foundation's
OS policy already discourages use of existing Stretch machines. That
means that it's time for all project admins to start rebuilding your VMs
with Bullseye (or, if you must, Buster.)
Any webservices running in Kubernetes created in the last year or two
are most likely using Buster images already, so there's no action needed
for those. Older kubernetes jobs should be refreshed to use more modern
images whenever possible.
If you are still using the grid engine for webservices, we strongly
encourage you to migrate your jobs to Kubernetes. For other grid uses,
watch this space for future announcements about grid engine migration;
we don't yet have a solution prepared for that.
Details about the what and why for this process can be found here:
https://wikitech.wikimedia.org/wiki/News/Stretch_deprecation
Here is the deprecation timeline:
March 2021: Stretch VM creation disabled in most projects
July 6, 2021: Active support of Stretch ends, Stretch moves into LTS
<- You are Here ->
January 1st, 2022: Stretch VM creation disabled in all projects,
deprecation nagging begins in earnest. Stretch alternatives will be
available for tool migration in Toolforge
May 1, 2022: All active Stretch VMs will be shut down (but not deleted)
by WMCS admins. This includes Toolforge grid exec nodes.
June 30, 2022: LTS support for Debian Stretch ends, all Stretch VMs will
be deleted by WMCS admins
TL;DR:
* Let's Encrypt [0] TLS certificates are "signed" by "root"
certificates to create a chain of trust
* The oldest "root" signing certificate for LE certs (DST Root CA X3)
expired on 2021-09-30 [1]
* Deprecated Toolforge Kubernetes containers only knew this root
certificate and not the newer root certificate (ISRG Root X1)
* Update your tool to a newer container to fix
We are starting to hear reports of tools that suddenly stopped working
on 2021-09-30. The common issue is accessing the APIs for Wikimedia
wikis.
The Wikimedia wikis use multiple TLS certificates issued by different
providers for redundancy and protection against a problem with a
single certificate provider. One of the certificate providers that we
use is Let's Encrypt (LE) [0]. LE certificates are themselves signed
by multiple "root" certificates to create a chain of trust that your
web browser or other TLS verifying software can trust. The oldest root
certificate (named "DST Root CA X3") used to sign the LE certificates
expired on 2021-09-30 [1]. Very old operating systems and some
compiled software do not have the newer root certificate (named "ISRG
Root X1") in their trusted certificate collection. These systems are
now rejecting LE certificates.
In Toolforge, we think that this mainly affects tools running on the
Kubernetes cluster inside Debian Jessie based containers. Specifically
the "php5.6", "python", "python2", and "ruby2" containers are expected
to have issues with the LE certificate expiration based on what we
have found so far. Recommended replacement containers are "php7.4",
"python3.9", and "ruby25".
We also have reports of `mono` on the bastions + grid engine failing.
We do not yet have a fix for this. It will require us to compile and
install a newer version of mono for everyone who is using it.
Interested folks can follow progress of our infrastructure updates in
response to this issue at T291387 [3].
[0]: https://letsencrypt.org/
[1]: https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/
[2]: https://phabricator.wikimedia.org/T291387
Bryan
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
Due to Mediawiki schema changes from:
https://phabricator.wikimedia.org/T291719
the abuse_filter_log.afl_filter column will be dropped from the wiki
replicas views on 2021/10/06. We apologize for any inconvenience. Please
update queries accordingly.
We will be upgrading PAWS Kubernetes 2021/09/07 at 1500UTC. User impacts
should be minimal. but you might see your notebook server stop and restart
during the change at some point.
Michael DiPietro
SRE
Wikimedia Cloud Services