If you were running a Toolforge web tool in Kubernetes before the toollabs-webservice label changes were deployed on 2021-09-29 (https://sal.toolforge.org/tools?d=2021-09-29 <https://sal.toolforge.org/tools?d=2021-09-29>). You may need to run `webservice stop && webservice start` in order to ensure your replica sets have correct label expectations on them going forward. Otherwise you may find confusing states may happen when running webservice restart and similar commands.
When I backfilled the new labels, I missed that you cannot change the label matching rules in a deployment retroactively. I apologize for any inconvenience.
In summary: If you haven’t run a webservice stop since 2021-09-29 on your Kubernetes web service, it would be a good idea to stop and start your webservice now to prevent any confusing behavior from webservice in the future.
Wikimedia Cloud Services
Next Tuesday we will be upgrading Kubernetes on toolforge. As part of the
upgrade we will need to restart all pods. This will produce a brief
interruption in web services and other tools that use kubernetes. Assuming
your services are able to survive a restart, no action should be needed on
Michael + the WMCS team
* Let's Encrypt  TLS certificates are "signed" by "root"
certificates to create a chain of trust
* The oldest "root" signing certificate for LE certs (DST Root CA X3)
expired on 2021-09-30 
* Deprecated Toolforge Kubernetes containers only knew this root
certificate and not the newer root certificate (ISRG Root X1)
* Update your tool to a newer container to fix
We are starting to hear reports of tools that suddenly stopped working
on 2021-09-30. The common issue is accessing the APIs for Wikimedia
The Wikimedia wikis use multiple TLS certificates issued by different
providers for redundancy and protection against a problem with a
single certificate provider. One of the certificate providers that we
use is Let's Encrypt (LE) . LE certificates are themselves signed
by multiple "root" certificates to create a chain of trust that your
web browser or other TLS verifying software can trust. The oldest root
certificate (named "DST Root CA X3") used to sign the LE certificates
expired on 2021-09-30 . Very old operating systems and some
compiled software do not have the newer root certificate (named "ISRG
Root X1") in their trusted certificate collection. These systems are
now rejecting LE certificates.
In Toolforge, we think that this mainly affects tools running on the
Kubernetes cluster inside Debian Jessie based containers. Specifically
the "php5.6", "python", "python2", and "ruby2" containers are expected
to have issues with the LE certificate expiration based on what we
have found so far. Recommended replacement containers are "php7.4",
"python3.9", and "ruby25".
We also have reports of `mono` on the bastions + grid engine failing.
We do not yet have a fix for this. It will require us to compile and
install a newer version of mono for everyone who is using it.
Interested folks can follow progress of our infrastructure updates in
response to this issue at T291387 .
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
Due to Mediawiki schema changes from:
the abuse_filter_log.afl_filter column will be dropped from the wiki
replicas views on 2021/10/06. We apologize for any inconvenience. Please
update queries accordingly.
Debian Stretch's security support ends in mid 2022, and the Foundation's
OS policy already discourages use of existing Stretch machines. That
means that it's time for all project admins to start rebuilding your VMs
with Bullseye (or, if you must, Buster.)
Any webservices running in Kubernetes created in the last year or two
are most likely using Buster images already, so there's no action needed
for those. Older kubernetes jobs should be refreshed to use more modern
images whenever possible.
If you are still using the grid engine for webservices, we strongly
encourage you to migrate your jobs to Kubernetes. For other grid uses,
watch this space for future announcements about grid engine migration;
we don't yet have a solution prepared for that.
Details about the what and why for this process can be found here:
Here is the deprecation timeline:
March 2021: Stretch VM creation disabled in most projects
July 6, 2021: Active support of Stretch ends, Stretch moves into LTS
<- You are Here ->
January 1st, 2022: Stretch VM creation disabled in all projects,
deprecation nagging begins in earnest. Stretch alternatives will be
available for tool migration in Toolforge
May 1, 2022: All active Stretch VMs will be shut down (but not deleted)
by WMCS admins. This includes Toolforge grid exec nodes.
June 30, 2022: LTS support for Debian Stretch ends, all Stretch VMs will
be deleted by WMCS admins
We will be upgrading PAWS Kubernetes 2021/09/07 at 1500UTC. User impacts
should be minimal. but you might see your notebook server stop and restart
during the change at some point.
Wikimedia Cloud Services
We will be upgrading PAWS Kubernetes today at 2030UTC. User impacts should
be minimal, but you might see your notebook server stop and restart during
the change at some point.
SRE Wikimedia Cloud Services
Quarry is currently running on python 3.5 on Debian Stretch. This is the
current version still running at quarry.wmflabs.org. A new version running
on python 3.7 on Debian Buster is now available at quarry.wmcloud.org. To
any interested party please test there and we will cut over the old domain
to the new buster systems if no problems are found in a few days.
Yesterday the hardworking developers at The Debian Project finalized the
latest version of Debian Linux, 'Bullseye' . I've created a new
Bullseye base image for cloud-vps and it should now be accessible in all
There are likely to be bumps in the road with such a young release, but
the WMCS team is committed to supporting Bullseye so you should feel
confident adopting Bullseye for any new development. My cursory tests
look pretty good but if you encounter issues specific to Bullseye
and-cloud-vps please create a phabricator ticket or reply on the cloud
-Andrew + the WMCS team