tl;dr:
All cloud-vps instances listed in this email will be rebooted, by me, on
Tuesday December 2nd. This list includes several bastion hosts, where
existing logins, long-running jobs, or tmux sessions will be interrupted.
If you would like more control, you can hard reboot your instances
anytime in the next week and then I will not need to reboot them on the
2nd. Make sure to use the 'Hard reboot instance' menu option; a soft
reboot is not enough to clear the network settings.
details:
We have recently updated our virtualization hardware to slightly
increase the MTU[0], allowing normal-sized packet transmission[1]. This
change was necessary in order to resolve an issue with Docker containers
and, potentially, other issues not yet revealed. Certain VMs that
existed before this update now have an MTU size that doesn't match the
new size set on all hypervisors; this limits our ability to migrate
those VMs and perform other routine maintenance tasks.
A stop/start or 'hard reboot' of affected instances resets the MTU, as
it rebuilds the virtual network stack associated with the host. A 'soft'
reboot (or simply rebooting from within the OS of the instance) leaves
the network stack intact so does not resolve the issue.
If you want to reboot your VMs before Tuesday, there's no need to notify
WMCS staff. I'll re-run the report that lists affected VMs immediately
before rebooting, and a hard reboot will remove your VMs from any future
such report.
-Andrew
[0] https://en.wikipedia.org/wiki/Maximum_transmission_unit
[1] https://phabricator.wikimedia.org/T408543
=====
accounts-appserver7.account-creation-assistance.eqiad1.wikimedia.cloud
deep-dive.analytics.eqiad1.wikimedia.cloud
gitlab-docker-runner-v2.analytics.eqiad1.wikimedia.cloud
T389375.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-pki-01.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-puppet-01.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-puppetdb-01.appservers.eqiad1.wikimedia.cloud
rn-hcptchprxy-urldownloader-[01-02].appservers.eqiad1.wikimedia.cloud
bastion-eqiad1-[5-6].bastion.eqiad1.wikimedia.cloud
k3s.catalyst.eqiad1.wikimedia.cloud
k3s-envDB.catalyst.eqiad1.wikimedia.cloud
k3s-worker[01-02].catalyst.eqiad1.wikimedia.cloud
k3s-worker01.catalyst-dev.eqiad1.wikimedia.cloud
ntp-[5-6].cloudinfra.eqiad1.wikimedia.cloud
mediawiki2latex.collection-alt-renderer.eqiad1.wikimedia.cloud
copypatrol-backend-prod-02.copypatrol.eqiad1.wikimedia.cloud
cvn-apache11.cvn.eqiad1.wikimedia.cloud
cvn-app[13-14].cvn.eqiad1.wikimedia.cloud
deployment-poolcounter07.deployment-prep.eqiad1.wikimedia.cloud
gitlab-1002.devtools.eqiad1.wikimedia.cloud
gitlab-runner-[1007-1008].devtools.eqiad1.wikimedia.cloud
k3s-test.devtools.eqiad1.wikimedia.cloud
generator01.dumpstorrents.eqiad1.wikimedia.cloud
debian13-test.dwl.eqiad1.wikimedia.cloud
taxonbot4.dwl.eqiad1.wikimedia.cloud
tmp.entity-detection.eqiad1.wikimedia.cloud
runner-[1031-1040].gitlab-runners.eqiad1.wikimedia.cloud
wikiwho-dev.globaleducation.eqiad1.wikimedia.cloud
prod0.hashtags.eqiad1.wikimedia.cloud
language-lab.language.eqiad1.wikimedia.cloud
lpl-cx-sx2.language.eqiad1.wikimedia.cloud
lpl-recommend.language.eqiad1.wikimedia.cloud
lpl-services.language.eqiad1.wikimedia.cloud
logging-logstash-04.logging.eqiad1.wikimedia.cloud
mariadbcompiler-trixie.mariadbtest.eqiad1.wikimedia.cloud
ci2.mediawiki-quickstart.eqiad1.wikimedia.cloud
coder-env-1.mobileappsperformance.eqiad1.wikimedia.cloud
wikiapiary.mwstake.eqiad1.wikimedia.cloud
filippo-centrallog-02.o11y.eqiad1.wikimedia.cloud
filippo-cloudcephosd-01.o11y.eqiad1.wikimedia.cloud
filippo-clouddumps-01.o11y.eqiad1.wikimedia.cloud
filippo-cloudgw-01.o11y.eqiad1.wikimedia.cloud
filippo-cloudvirt-[01-02].o11y.eqiad1.wikimedia.cloud
phi-alert-01.o11y.eqiad1.wikimedia.cloud
phi-arclamp-01.o11y.eqiad1.wikimedia.cloud
phi-grafana-01.o11y.eqiad1.wikimedia.cloud
phi-kafka-01.o11y.eqiad1.wikimedia.cloud
phi-kafkamon-01.o11y.eqiad1.wikimedia.cloud
phi-lb-01.o11y.eqiad1.wikimedia.cloud
phi-mwlog-01.o11y.eqiad1.wikimedia.cloud
phi-pki-01.o11y.eqiad1.wikimedia.cloud
phi-prometheus-[01-02].o11y.eqiad1.wikimedia.cloud
phi-puppet-01.o11y.eqiad1.wikimedia.cloud
phi-syslog-01.o11y.eqiad1.wikimedia.cloud
phi-titan-01.o11y.eqiad1.wikimedia.cloud
phi-webperf-01.o11y.eqiad1.wikimedia.cloud
pixel.pixel.eqiad1.wikimedia.cloud
canasta-test.pluggableauth.eqiad1.wikimedia.cloud
dcl-dev1.puppet-dev.eqiad1.wikimedia.cloud
section-ranker.recommendation-api.eqiad1.wikimedia.cloud
semantic-search.recommendation-api.eqiad1.wikimedia.cloud
trixie.search.eqiad1.wikimedia.cloud
font-db.signwriting.eqiad1.wikimedia.cloud
dcl.swift.eqiad1.wikimedia.cloud
filippo-tom-k8s-worker-01.testlabs.eqiad1.wikimedia.cloud
filippo-tom-pki-01.testlabs.eqiad1.wikimedia.cloud
filippo-tom-puppetdb-01.testlabs.eqiad1.wikimedia.cloud
pontoon-demo-puppet-01.testlabs.eqiad1.wikimedia.cloud
tools-bastion-15.tools.eqiad1.wikimedia.cloud
tools-db-7.tools.eqiad1.wikimedia.cloud
tools-nfs-3.tools.eqiad1.wikimedia.cloud
toolsbeta-nfs-5.toolsbeta.eqiad1.wikimedia.cloud
toolsbeta-prometheus-2.toolsbeta.eqiad1.wikimedia.cloud
voterlists-1.voterlists.eqiad1.wikimedia.cloud
backend.wikicommunityhealth.eqiad1.wikimedia.cloud
uwl.wikicommunityhealth.eqiad1.wikimedia.cloud
wikibase-metadata.wikidata-dev.eqiad1.wikimedia.cloud
wikidata-reconciliation-trixie.wikidata-reconciliation.eqiad1.wikimedia.cloud
k3s.wikifunctions.eqiad1.wikimedia.cloud
wikipeoplestats-db01.wikipeoplestats.eqiad1.wikimedia.cloud
wsexport-app-prod01.wikisource.eqiad1.wikimedia.cloud
demo-wiki.wikispeech.eqiad1.wikimedia.cloud
glamspore-prod-01.wikispore.eqiad1.wikimedia.cloud
ctt-prv-04.wikitextexp.eqiad1.wikimedia.cloud
journalist1.wmgmc-monitoring.eqiad1.wikimedia.cloud
player1.wmgmc-monitoring.eqiad1.wikimedia.cloud
press1.wmgmc-monitoring.eqiad1.wikimedia.cloud
xtools-dev08.xtools.eqiad1.wikimedia.cloud
xtools-prod[14-15].xtools.eqiad1.wikimedia.cloud
zuul-haproxy-01.zuul.eqiad1.wikimedia.cloud
zuul-puppetserver-01.zuul.eqiad1.wikimedia.cloud
microk8s.zuul3.eqiad1.wikimedia.cloud
I will be rolling out some network setting changes on the Cloud VPS
hypervisors and other networking equipment over the course of today.
Applying the changes on a hypervisor will cause network interruptions
for roughly 5-10 seconds for all VMs hosted on it.
For technical details, see [0].
[0]: https://phabricator.wikimedia.org/T330075
Taavi
--
Taavi Väänänen (he/they)
Site Reliability Engineer, Tools Infrastructure
Wikimedia Foundation