Hi,
We will be upgrading the Toolforge Kubernetes cluster[0] on January
14th (next Wednesday) starting at around 14:00 UTC.
The expected impact is that all tool webservices and running jobs will
get restarted a couple of times over the course of the few hours it
takes for us to upgrade the entire cluster. The ability to manage
tools will remain operational.
[0]: https://phabricator.wikimedia.org/T413797
Taavi
--
Taavi Väänänen (he/they)
Site Reliability Engineer, Tools Infrastructure
Wikimedia Foundation
tl;dr:
All cloud-vps instances listed in this email will be rebooted, by me, on
Tuesday December 2nd. This list includes several bastion hosts, where
existing logins, long-running jobs, or tmux sessions will be interrupted.
If you would like more control, you can hard reboot your instances
anytime in the next week and then I will not need to reboot them on the
2nd. Make sure to use the 'Hard reboot instance' menu option; a soft
reboot is not enough to clear the network settings.
details:
We have recently updated our virtualization hardware to slightly
increase the MTU[0], allowing normal-sized packet transmission[1]. This
change was necessary in order to resolve an issue with Docker containers
and, potentially, other issues not yet revealed. Certain VMs that
existed before this update now have an MTU size that doesn't match the
new size set on all hypervisors; this limits our ability to migrate
those VMs and perform other routine maintenance tasks.
A stop/start or 'hard reboot' of affected instances resets the MTU, as
it rebuilds the virtual network stack associated with the host. A 'soft'
reboot (or simply rebooting from within the OS of the instance) leaves
the network stack intact so does not resolve the issue.
If you want to reboot your VMs before Tuesday, there's no need to notify
WMCS staff. I'll re-run the report that lists affected VMs immediately
before rebooting, and a hard reboot will remove your VMs from any future
such report.
-Andrew
[0] https://en.wikipedia.org/wiki/Maximum_transmission_unit
[1] https://phabricator.wikimedia.org/T408543
=====
accounts-appserver7.account-creation-assistance.eqiad1.wikimedia.cloud
deep-dive.analytics.eqiad1.wikimedia.cloud
gitlab-docker-runner-v2.analytics.eqiad1.wikimedia.cloud
T389375.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-pki-01.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-puppet-01.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-puppetdb-01.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-urldownloader-[01-02].appservers.eqiad1.wikimedia.cloud
bastion-eqiad1-[5-6].bastion.eqiad1.wikimedia.cloud
k3s.catalyst.eqiad1.wikimedia.cloud
k3s-envDB.catalyst.eqiad1.wikimedia.cloud
k3s-worker[01-02].catalyst.eqiad1.wikimedia.cloud
k3s-worker01.catalyst-dev.eqiad1.wikimedia.cloud
ntp-[5-6].cloudinfra.eqiad1.wikimedia.cloud
mediawiki2latex.collection-alt-renderer.eqiad1.wikimedia.cloud
copypatrol-backend-prod-02.copypatrol.eqiad1.wikimedia.cloud
cvn-apache11.cvn.eqiad1.wikimedia.cloud
cvn-app[13-14].cvn.eqiad1.wikimedia.cloud
deployment-poolcounter07.deployment-prep.eqiad1.wikimedia.cloud
gitlab-1002.devtools.eqiad1.wikimedia.cloud
gitlab-runner-[1007-1008].devtools.eqiad1.wikimedia.cloud
k3s-test.devtools.eqiad1.wikimedia.cloud
generator01.dumpstorrents.eqiad1.wikimedia.cloud
debian13-test.dwl.eqiad1.wikimedia.cloud
taxonbot4.dwl.eqiad1.wikimedia.cloud
tmp.entity-detection.eqiad1.wikimedia.cloud
runner-[1031-1040].gitlab-runners.eqiad1.wikimedia.cloud
wikiwho-dev.globaleducation.eqiad1.wikimedia.cloud
prod0.hashtags.eqiad1.wikimedia.cloud
language-lab.language.eqiad1.wikimedia.cloud
lpl-cx-sx2.language.eqiad1.wikimedia.cloud
lpl-recommend.language.eqiad1.wikimedia.cloud
lpl-services.language.eqiad1.wikimedia.cloud
logging-logstash-04.logging.eqiad1.wikimedia.cloud
mariadbcompiler-trixie.mariadbtest.eqiad1.wikimedia.cloud
ci2.mediawiki-quickstart.eqiad1.wikimedia.cloud
coder-env-1.mobileappsperformance.eqiad1.wikimedia.cloud
wikiapiary.mwstake.eqiad1.wikimedia.cloud
filippo-centrallog-02.o11y.eqiad1.wikimedia.cloud
filippo-cloudcephosd-01.o11y.eqiad1.wikimedia.cloud
filippo-clouddumps-01.o11y.eqiad1.wikimedia.cloud
filippo-cloudgw-01.o11y.eqiad1.wikimedia.cloud
filippo-cloudvirt-[01-02].o11y.eqiad1.wikimedia.cloud
phi-alert-01.o11y.eqiad1.wikimedia.cloud
phi-arclamp-01.o11y.eqiad1.wikimedia.cloud
phi-grafana-01.o11y.eqiad1.wikimedia.cloud
phi-kafka-01.o11y.eqiad1.wikimedia.cloud
phi-kafkamon-01.o11y.eqiad1.wikimedia.cloud
phi-lb-01.o11y.eqiad1.wikimedia.cloud
phi-mwlog-01.o11y.eqiad1.wikimedia.cloud
phi-pki-01.o11y.eqiad1.wikimedia.cloud
phi-prometheus-[01-02].o11y.eqiad1.wikimedia.cloud
phi-puppet-01.o11y.eqiad1.wikimedia.cloud
phi-syslog-01.o11y.eqiad1.wikimedia.cloud
phi-titan-01.o11y.eqiad1.wikimedia.cloud
phi-webperf-01.o11y.eqiad1.wikimedia.cloud
pixel.pixel.eqiad1.wikimedia.cloud
canasta-test.pluggableauth.eqiad1.wikimedia.cloud
dcl-dev1.puppet-dev.eqiad1.wikimedia.cloud
section-ranker.recommendation-api.eqiad1.wikimedia.cloud
semantic-search.recommendation-api.eqiad1.wikimedia.cloud
trixie.search.eqiad1.wikimedia.cloud
font-db.signwriting.eqiad1.wikimedia.cloud
dcl.swift.eqiad1.wikimedia.cloud
filippo-tom-k8s-worker-01.testlabs.eqiad1.wikimedia.cloud
filippo-tom-pki-01.testlabs.eqiad1.wikimedia.cloud
filippo-tom-puppetdb-01.testlabs.eqiad1.wikimedia.cloud
pontoon-demo-puppet-01.testlabs.eqiad1.wikimedia.cloud
tools-bastion-15.tools.eqiad1.wikimedia.cloud
tools-db-7.tools.eqiad1.wikimedia.cloud
tools-nfs-3.tools.eqiad1.wikimedia.cloud
toolsbeta-nfs-5.toolsbeta.eqiad1.wikimedia.cloud
toolsbeta-prometheus-2.toolsbeta.eqiad1.wikimedia.cloud
voterlists-1.voterlists.eqiad1.wikimedia.cloud
backend.wikicommunityhealth.eqiad1.wikimedia.cloud
uwl.wikicommunityhealth.eqiad1.wikimedia.cloud
wikibase-metadata.wikidata-dev.eqiad1.wikimedia.cloud
wikidata-reconciliation-trixie.wikidata-reconciliation.eqiad1.wikimedia.cloud
k3s.wikifunctions.eqiad1.wikimedia.cloud
wikipeoplestats-db01.wikipeoplestats.eqiad1.wikimedia.cloud
wsexport-app-prod01.wikisource.eqiad1.wikimedia.cloud
demo-wiki.wikispeech.eqiad1.wikimedia.cloud
glamspore-prod-01.wikispore.eqiad1.wikimedia.cloud
ctt-prv-04.wikitextexp.eqiad1.wikimedia.cloud
journalist1.wmgmc-monitoring.eqiad1.wikimedia.cloud
player1.wmgmc-monitoring.eqiad1.wikimedia.cloud
press1.wmgmc-monitoring.eqiad1.wikimedia.cloud
xtools-dev08.xtools.eqiad1.wikimedia.cloud
xtools-prod[14-15].xtools.eqiad1.wikimedia.cloud
zuul-haproxy-01.zuul.eqiad1.wikimedia.cloud
zuul-puppetserver-01.zuul.eqiad1.wikimedia.cloud
microk8s.zuul3.eqiad1.wikimedia.cloud
I will be rolling out some network setting changes on the Cloud VPS
hypervisors and other networking equipment over the course of today.
Applying the changes on a hypervisor will cause network interruptions
for roughly 5-10 seconds for all VMs hosted on it.
For technical details, see [0].
[0]: https://phabricator.wikimedia.org/T330075
Taavi
--
Taavi Väänänen (he/they)
Site Reliability Engineer, Tools Infrastructure
Wikimedia Foundation
Hello, all!
If you use a cloud-vps project (other than toolforge), please update the
entry about your project on this page:
https://wikitech.wikimedia.org/wiki/News/2025_Cloud_VPS_Purge
There are detailed instructions on that page about how to annotate your
project. If your project is unclaimed after a few months, it will be
subject to suspension and, ultimately, deletion. Perhaps more
importantly, you will receive a huge number of ever-grumpier emails from
cloud administrators asking you to respond.
In previous years we've only asked that you mark projects as 'in use'.
This year we're also trying to gather summary information about the
actual purpose of each project as part of an initiative to clarify
use-cases of cloud-vps and toolforge; please include as much information
as you are able.
If you see an unclaimed project on that list which you use but are not
an admin of, feel free to make a note anyway, or reach out to your
admins and enourage them to do so.
Thank you!
-Andrew
Hi,
Starting today, Cloud VPS hosts no longer support forwarding in your
SSH agent[0]. This should not break any modern setups, but in case
your access is impacted by this change, please see the documentation
on Wikitech[1] on what to use instead. This change is being done due
to the security risks that SSH agent forwarding have, especially in
shared environments like Cloud VPS.
[0]: https://docs.github.com/en/authentication/connecting-to-github-with-ssh/usi…
[1]: https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances
Taavi
--
Taavi Väänänen (he/him)
Site Reliability Engineer, Cloud Services
Wikimedia Foundation
Hello all,
the recent Toolforge NFS server OS upgrade has helped improve NFS workers
getting stuck[0]. We will be performing a followup and shorter
maintenance tomorrow, Wed 15th 8-9 UTC.
Thank you for your understanding and patience.
best,
Filippo
[0]: https://phabricator.wikimedia.org/T404584
--
*Filippo Giunchedi*
Staff Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello all,
On Monday 13th from 8 to 10 UTC there will be a maintenance window for
Toolforge NFS. We will be striving to minimize the user-facing NFS outage
and its related services. This maintenance window affects all tools with
NFS access enabled, the bastion servers and tools-static hosted files.
Tools using build service images without NFS mounts [0] will not be
affected.
The maintenance window will be used to upgrade the Toolforge NFS server to
a Debian Trixie VM, bringing more than two years of Linux development. The
upgrade, while part of the regular OS lifecycle, is also aimed at narrowing
down the NFS-related problems within Toolforge. Please see [1] for more
details.
best,
Filippo
[0]:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_container_image…
[1]: https://phabricator.wikimedia.org/T404584
--
*Filippo Giunchedi*
Staff Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
(If you don't work with SHA1 values of revisions, you can ignore this
message)
Hello,
As part of performance improvements to the revision table, we are
reviewing
the purpose and usage of the `rev_sha1` field.
Currently, this field is mainly used to detect identical revisions, for
example
in manual revert detection. The `rev_sha1` value is calculated from the
SHA1 values of all slots in the revision, which are stored in the
content table:
- The SHA1 of a slot is generated from its content in base36.
- For revisions with only one slot (the case for all wikis except
Commons),
`rev_sha1` matches the SHA1 of that slot.
- On Commons, most revisions have two slots ("main" and "mediainfo"). In
that case, the SHA1 of the revision is computed by concatenating the
SHA1
values of both slots, then hashing that concatenated value again with
SHA1.
We have decided to drop the `rev_sha1` field and compute the SHA1 value
of a
revision on the fly from the `content_sha1` values in its slots.
The same change applies to the archive table: the `ar_sha1` field (for
deleted
revisions) will also be removed.
If you currently use the `rev_sha1` or `ar_sha1` fields, please switch
to
using `content_sha1` instead. These fields will be removed from
wikireplicas
in three weeks.
You can follow progress here: https://phabricator.wikimedia.org/T389026
Thank you,
Alexander Vorwerk — IRC: Zabe