TL;DR: * Let's Encrypt [0] TLS certificates are "signed" by "root" certificates to create a chain of trust * The oldest "root" signing certificate for LE certs (DST Root CA X3) expired on 2021-09-30 [1] * Deprecated Toolforge Kubernetes containers only knew this root certificate and not the newer root certificate (ISRG Root X1) * Update your tool to a newer container to fix
We are starting to hear reports of tools that suddenly stopped working on 2021-09-30. The common issue is accessing the APIs for Wikimedia wikis.
The Wikimedia wikis use multiple TLS certificates issued by different providers for redundancy and protection against a problem with a single certificate provider. One of the certificate providers that we use is Let's Encrypt (LE) [0]. LE certificates are themselves signed by multiple "root" certificates to create a chain of trust that your web browser or other TLS verifying software can trust. The oldest root certificate (named "DST Root CA X3") used to sign the LE certificates expired on 2021-09-30 [1]. Very old operating systems and some compiled software do not have the newer root certificate (named "ISRG Root X1") in their trusted certificate collection. These systems are now rejecting LE certificates.
In Toolforge, we think that this mainly affects tools running on the Kubernetes cluster inside Debian Jessie based containers. Specifically the "php5.6", "python", "python2", and "ruby2" containers are expected to have issues with the LE certificate expiration based on what we have found so far. Recommended replacement containers are "php7.4", "python3.9", and "ruby25".
We also have reports of `mono` on the bastions + grid engine failing. We do not yet have a fix for this. It will require us to compile and install a newer version of mono for everyone who is using it.
Interested folks can follow progress of our infrastructure updates in response to this issue at T291387 [3].
[0]: https://letsencrypt.org/ [1]: https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/ [2]: https://phabricator.wikimedia.org/T291387
Bryan
One more tip in case it helps, if you're running into errors when using a Debian Stretch container or using the grid, long-running processes might just need a restart to pick up the openssl/gnutls bug fixes that were upgraded earlier this month. See https://phabricator.wikimedia.org/T292263 for an example.
-- Legoktm