On 2/14/18 6:58 AM, Chase Pettet wrote:

We lost a KVM host at around 7:20 UTC.  Because we use local storage for instances there are a number of them that are down.  Toolforge suffered a few losses but it seems to have been few enough that GridEngine and Kubernetes users are unaffected at this time .  The initial task is T187292 (with a list of instances), and an incident report will follow.  We hope to recover all of the instances that are down but it will take time to sort through.
This outage is still ongoing.

We're currently waiting on some on-site data center work (re-applying thermal paste to the hosts' CPUs) before determining exactly how to respond.  It still appears that no actual data has been lost but the affected VMs will remain turned off for several more hours.

Here is a complete list of the VMs that are affected by this:

accounts-appserver4.account-creation-assistance
accounts-mwoauth.account-creation-assistance
bastion-02.bastion
bastion-restricted-02.bastion
bf-wmpageview.butterfly
chat-bots.mobile
ci-jessie-wikimedia-965167.contintcloud
ci-jessie-wikimedia-965171.contintcloud
ci-jessie-wikimedia-965176.contintcloud
ci-jessie-wikimedia-965182.contintcloud
ci-jessie-wikimedia-965183.contintcloud
ci-jessie-wikimedia-965184.contintcloud
ci-jessie-wikimedia-965185.contintcloud
client.nonfreewiki
commonsarchive-production.commonsarchive
cxserver2.language
dashboardchat.globaleducation
deployment-changeprop.deployment-prep
deployment-elastic05.deployment-prep
deployment-ircd.deployment-prep
deployment-mathoid.deployment-prep
deployment-sca02.deployment-prep
drmf2016.math
huggle-pg.huggle
incubator-web.incubator
integration-slave-jessie-1001.integration
integration-slave-jessie-1002.integration
k8s-bastion.chasetestproject
language-mleb-master.language
ldfclient.wikidata-query
math-ru.math
mwaas-k8-node-02.scrumbugz
mwoffliner1.mwoffliner
mwv-apt-01.mwv-apt
newsletter-test.newsletter
ores-lb-02.ores
ores-worker-04.ores
overpass-wiki.maps
puppetmaster-keith.puppet
reflex2.design
rel.search
stack.reading-web-staging
tools-docker-builder-05.tools
tools-exec-1413.tools
tools-exec-1442.tools
tools-webgrid-lighttpd-1427.tools
tools-webgrid-lighttpd-1428.tools
torproxy.security-tools
udpmx-01.ircd
video-redis.video
wikidataconcepts.wikidataconcepts
wikiedu-dashboard-staging.globaleducation
wikilabels-experiment.wikilabels
wikilabels-staging-01.wikilabels
wikimetrics-staging.wikimetrics
wikimetrics-test.wikimetrics
wmde-wikidiff2-patched.wikidiff2-wmde-dev
zk1-1.analytics




--
Chase Pettet
chasemp on phabricator and IRC


_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce