Good morning!
The canary reboots last week went well, so we'll be upgrading and rebooting the rest of the cloud over the course of the day today, beginning in a few minutes.
As always, we'll do our best to minimize effects within toolforge, although it's always a good idea to make sure your jobs are still running after windows like this. The list of VMs from last week (attached below) are already good to go so they should be unaffected today.
-Andrew
On 1/11/18 3:15 PM, Andrew Bogott wrote:
Today's round of reboots is now finished -- the hosts rebooted are listed below.
One correction: Monday is a holiday, so we're planning to reboot the rest of the fleet on Tuesday, January 16th. Any VMs not in the list below should anticipate downtime at some point on Tuesday.
-Andrew
On 1/11/18 1:02 PM, Andrew Bogott wrote:
In a few minutes I'm going to start the first round of reboots. We're going to do a subset of the cloud and then make sure there are no bad effects before doing the remainder on Monday.
The following VMs will be upgraded and rebooted over the next few hours:
aborrero-test: puppet-vm account-creation-assistance: accounts-dbslave analytics: hadoop-worker-3 analytics: k3-1 analytics: k3-2 automation-framework: af-debmonitor automation-framework: af-puppetdb02 butterfly: butterfly-m4m catgraph: fishbone cvn: cvn-apache9 cvn: cvn-app8 cvn: cvn-app9 cyberbot: cyberbot-exec-01 cyberbot: cyberbot-exec-iabot-01 deployment-prep: deployment-cassandra3-02 deployment-prep: deployment-cpjobqueue deployment-prep: deployment-kafka-jumbo-1 deployment-prep: deployment-memc05 deployment-prep: deployment-mx deployment-prep: deployment-netbox deployment-prep: deployment-redis01 deployment-prep: deployment-redis05 deployment-prep: deployment-sca01 deployment-prep: deployment-sca03 discovery-stats: language-detector-01 dwl: taxonbot git: gerrit-test git: gerrit-test3 glampipe: Glampipe globaleducation: women-in-red hhvm: hhvm-jmm huggle: huggle-wl integration: integration-slave-docker-1004 integration: integration-slave-docker-1005 integration: integration-slave-jessie-1003 integration: integration-slave-jessie-1004 kubernetes-testing: kmaster language: language-dev mediawiki-vagrant: mwv-stretch-migration monitoring: filippo-test-jessie3 mwstake: mwstake ogvjs-integration: media-streaming otrs: otrs-oneclickspam-test phabricator: puppet-phabricator planet: puppenmeister pluggableauth: cindy pluggableauth: oidc-google privpol-captcha: captcha-consul-32 privpol-captcha: captcha-tf-31 project-smtp: smtp-test1 rcm: oxygen reading-web-staging: chromium-pdf reading-web-staging: proton-staging recommendation-api: recommendation-api-build redirects: redirects-nginx01 scrumbugz: wikibase-docker-20171109-1 search: search-jessie security-tools: jobs security-tools: scanner00 security-tools: two-factor security-tools: xsstest sentry: sentry-builder services: ceph-1 services: pdfservice services: sca1 suggestbot: suggestbot-prod swift: swift-prometheus testlabs: puppet-compiler-tools testlabs: puppet-compiler-v4-tools testlabs: util-abogott toolserver-legacy: relic traffic: traffic-misc-varnish5 traffic: traffic-peerassist traffic: traffic-upload-varnish5 ttmserver: ttmserver-elasticsearch01 ttmserver: ttmserver-salt01 twl: twlight-prod twl: twlight-staging wikibrain: wikibrain-embeddings-02 wikidata-dev: elastic-wikidata wikidata-query: wdqs-deploy wikidata-topicmaps: wtui-new wikifactmine: elasticsearch-01 wikimania-support: scholarships-02 wmam: wikikids yandex-proxy: yandex-proxy01
On 1/4/18 9:28 AM, Andrew Bogott wrote:
Sometime soon (probably in the next day or two) we will be applying kernel patches to all VMs and physical hosts in WMCS. This is to address an urgent security issue[1] , so we'll be skipping the traditional 7-day warning period -- basically as soon as proper fixes are available we'll start patching and rebooting.
As usual, we'll do our best to re-balance Toolforge grid nodes, so impact on Toolforge users should be minimal (worst case you may need to manually restart interrupted tasks).
For other users: if your VPS project requires special handling or specific notice about when a particular VM will reboot, please add a subtask describing your need to https://phabricator.wikimedia.org/T184189 .
[1] https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce