We don't fully understand what happened, but after Giovanni performed a classic "turning it off and on again" things are now running without warnings. The VMs listed below are now coming back online and everything should be back up shortly.
We'll probably replace some of this hardware anyway, out of an abundance of caution, but that's unlikely to produce further downtime. With luck, this is the last you'll hear about this.
-Andrew
On 2/13/19 7:25 AM, Andrew Bogott wrote:
We're currently experiencing a mysterious hareware failure in our datacenter -- three different SSDs failed overnight, two of them in cloudvirt1018 and one of them in cloudvirt1024. The VMs on 1018 are down entirely. We may move those on 1024 to another host shortly in order to guard against additional drive failure.
There's some possibility that we will experience permanent data loss on cloudvirt1018, but everyone is working hard to avoid this.
The following VMs are on cloudvirt1018:
a11y | reading-web-staging abogott-scapserver | testlabs af-puppetdb01 | automation-framework api | openocr asdf | quotatest bastion-eqiad1-02 | bastion clm-test-01 | community-labs-monitoring compiler1002 | puppet-diffs cyberbot-exec-iabot-01 | cyberbot deployment-db03 | deployment-prep deployment-db04 | deployment-prep deployment-memc05 | deployment-prep deployment-pdfrender02 | deployment-prep deployment-sca01 | deployment-prep design-lsg3 | design eventmetrics-dev01 | eventmetrics fridolin | catgraph gtirloni-puppetmaster-01 | testlabs hadoop-master-3 | analytics ign | ign2commons integration-castor03 | integration integration-slave-docker-1017 | integration integration-slave-docker-1033 | integration integration-slave-docker-1038 | integration integration-slave-jessie-1003 | integration integration-slave-jessie-android | integration k8s-master-01 | general-k8s k8s-node-03 | general-k8s k8s-node-05 | general-k8s k8s-node-06 | general-k8s kdc | analytics labstash-jessie1 | logging language-mleb-legacy | language login-test | catgraph lsg-01 | design mathosphere | math mc-clusterA-1 | test-twemproxy mwoffliner5 | mwoffliner novaadminmadethis-4 | quotatest ntp-01 | cloudinfra ntp-02 | cloudinfra ogvjs-testing | ogvjs-integration phragile-pro | phragile planet-hotdog | planet pub2 | wikiapiary puppenmeister | planet puppet-compiler-v4-other | testlabs puppet-compiler-v4-tools | testlabs quarry-beta-01 | quarry signwriting-swis | signwriting signwriting-swserver | signwriting social-tools3 | social-tools striker-deploy04 | striker striker-puppet01 | striker t166878 | otrs togetherjs | visualeditor tools-sgebastion-06 | tools tools-sgeexec-0902 | tools tools-sgeexec-0903 | tools tools-sgewebgrid-generic-0901 | tools tools-sgewebgrid-lighttpd-0901 | tools ve-font | design wikibase1 | sciencesource wikicitevis-prod | wikicitevis wikifarm | pluggableauth women-in-red | globaleducation
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce