Good morning!
The canary reboots last week went well, so we'll be upgrading and
rebooting the rest of the cloud over the course of the day today,
beginning in a few minutes.
As always, we'll do our best to minimize effects within toolforge,
although it's always a good idea to make sure your jobs are still
running after windows like this. The list of VMs from last week
(attached below) are already good to go so they should be unaffected today.
-Andrew
On 1/11/18 3:15 PM, Andrew Bogott wrote:
Today's round of reboots is now finished -- the
hosts rebooted are
listed below.
One correction: Monday is a holiday, so we're planning to reboot the
rest of the fleet on Tuesday, January 16th. Any VMs not in the list
below should anticipate downtime at some point on Tuesday.
-Andrew
On 1/11/18 1:02 PM, Andrew Bogott wrote:
In a few minutes I'm going to start the first
round of reboots.
We're going to do a subset of the cloud and then make sure there are
no bad effects before doing the remainder on Monday.
The following VMs will be upgraded and rebooted over the next few hours:
aborrero-test: puppet-vm
account-creation-assistance: accounts-dbslave
analytics: hadoop-worker-3
analytics: k3-1
analytics: k3-2
automation-framework: af-debmonitor
automation-framework: af-puppetdb02
butterfly: butterfly-m4m
catgraph: fishbone
cvn: cvn-apache9
cvn: cvn-app8
cvn: cvn-app9
cyberbot: cyberbot-exec-01
cyberbot: cyberbot-exec-iabot-01
deployment-prep: deployment-cassandra3-02
deployment-prep: deployment-cpjobqueue
deployment-prep: deployment-kafka-jumbo-1
deployment-prep: deployment-memc05
deployment-prep: deployment-mx
deployment-prep: deployment-netbox
deployment-prep: deployment-redis01
deployment-prep: deployment-redis05
deployment-prep: deployment-sca01
deployment-prep: deployment-sca03
discovery-stats: language-detector-01
dwl: taxonbot
git: gerrit-test
git: gerrit-test3
glampipe: Glampipe
globaleducation: women-in-red
hhvm: hhvm-jmm
huggle: huggle-wl
integration: integration-slave-docker-1004
integration: integration-slave-docker-1005
integration: integration-slave-jessie-1003
integration: integration-slave-jessie-1004
kubernetes-testing: kmaster
language: language-dev
mediawiki-vagrant: mwv-stretch-migration
monitoring: filippo-test-jessie3
mwstake: mwstake
ogvjs-integration: media-streaming
otrs: otrs-oneclickspam-test
phabricator: puppet-phabricator
planet: puppenmeister
pluggableauth: cindy
pluggableauth: oidc-google
privpol-captcha: captcha-consul-32
privpol-captcha: captcha-tf-31
project-smtp: smtp-test1
rcm: oxygen
reading-web-staging: chromium-pdf
reading-web-staging: proton-staging
recommendation-api: recommendation-api-build
redirects: redirects-nginx01
scrumbugz: wikibase-docker-20171109-1
search: search-jessie
security-tools: jobs
security-tools: scanner00
security-tools: two-factor
security-tools: xsstest
sentry: sentry-builder
services: ceph-1
services: pdfservice
services: sca1
suggestbot: suggestbot-prod
swift: swift-prometheus
testlabs: puppet-compiler-tools
testlabs: puppet-compiler-v4-tools
testlabs: util-abogott
toolserver-legacy: relic
traffic: traffic-misc-varnish5
traffic: traffic-peerassist
traffic: traffic-upload-varnish5
ttmserver: ttmserver-elasticsearch01
ttmserver: ttmserver-salt01
twl: twlight-prod
twl: twlight-staging
wikibrain: wikibrain-embeddings-02
wikidata-dev: elastic-wikidata
wikidata-query: wdqs-deploy
wikidata-topicmaps: wtui-new
wikifactmine: elasticsearch-01
wikimania-support: scholarships-02
wmam: wikikids
yandex-proxy: yandex-proxy01
On 1/4/18 9:28 AM, Andrew Bogott wrote:
Sometime soon (probably in the next day or two)
we will be applying
kernel patches to all VMs and physical hosts in WMCS. This is to
address an urgent security issue[1] , so we'll be skipping the
traditional 7-day warning period -- basically as soon as proper
fixes are available we'll start patching and rebooting.
As usual, we'll do our best to re-balance Toolforge grid nodes, so
impact on Toolforge users should be minimal (worst case you may need
to manually restart interrupted tasks).
For other users: if your VPS project requires special handling or
specific notice about when a particular VM will reboot, please add a
subtask describing your need to
https://phabricator.wikimedia.org/T184189 .
[1]
https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce