Now cloudvirt1024 is dying in earnest, so VMs hosted
there will be
down for a while as well. This is, as far as anyone can tell, just a
stupid coincidence.
So far it appears that we are going to be able to rescue /most/ things
without significant data loss. For now, though, there's going to be
plenty more downtime.
VMs on cloudvirt1024 are:
| 8113d2c5-6788-43f6-beeb-123b0b717af3 | drmf-beta
| math
| 169b3260-4f7e-43dc-94c2-e699308a3426 | ecmabot
| webperf
| 29e875e3-15d5-4f74-9716-c0025c2ea098 | encoding02
| video
| 1b2b8b50-d463-4b7f-a3a9-6363eeb3ca8b | encoding03
| video
| 5421f938-7a11-499c-bc6a-534da1f4e27d | hafnium
| rcm
| 041d42b9-df36-4176-9f5d-a508989bbebc | hound-app-01
| hound
| 6149375b-8a08-4f03-882a-6fc0f5f77499 | integration-slave-docker-1044
| integration
| 4d64b032-d93a-4a8c-a7e5-569c17e5063f | integration-slave-docker-1046
| integration
| ad48959a-9eb9-46a9-bec4-a2bf23cdf655 | integration-slave-docker-1047
| integration
| 21644632-0972-448f-83d0-b76f9d1d28e0 | ldfclient-new
| wikidata-query
| c2a30fe0-2c87-4b01-be53-8e2a3d0f40a7 | math-docker
| math
| df8f17fb-03fe-4725-b9cf-3d9fe76f4654 | mediawiki2latex
| collection-alt-renderer
| d73f36e6-7534-4910-9a6e-64a6b9088d1e | neon
| rcm
| 2d035965-ba53-41b3-b6ef-d2ebbe50656a | novaadminmadethis
| quotatest
| c84f61c0-4fd2-47a5-b6ab-dd6b5ea98d41 | ores-puppetmaster-01
| ores
| 585bb328-8078-4437-b076-9e555683e27d | ores-sentinel-01
| ores
| 0538bfed-d7b5-4751-9431-8feecbaf78c0 | oxygen
| rcm
| e8090d9e-7529-46a9-b1e1-c4ba523a2898 | packaging
| thumbor
| c7fe4663-7f2b-4d23-a79b-1a2e01c80d93 | twlight-prod
| twl
| 2370b38f-7a65-4ccf-a635-7a2fa5e12b3e | twlight-staging
| twl
| 464577c6-86f0-42f9-9c49-86f9ec9a0210 | twlight-tracker
| twl
| 5325322d-a57e-4a9b-85b7-37643f03bfea | wikidata-misc
| wikidata-dev
On 2/13/19 11:23 AM, Andrew Bogott wrote:
Here's the latest:
cloudvirt1018 is up and running, and many of its VMs are fine. Many
other VMs are corrupted and won't start up. Some of those VMs will
probably be lost for good, but we're still investigating rescue options.
In the meantime, if your VM is up and you can access it then you're
in luck! If not, stay tuned.
-Andrew
On 2/13/19 9:15 AM, Andrew Bogott wrote:
I spoke too soon -- we're still working on
this. Most of these VMs
will remain down in the meantime.
Sorry for the outage!
On 2/13/19 8:21 AM, Andrew Bogott wrote:
We don't fully understand what happened, but
after Giovanni
performed a classic "turning it off and on again" things are now
running without warnings. The VMs listed below are now coming back
online and everything should be back up shortly.
We'll probably replace some of this hardware anyway, out of an
abundance of caution, but that's unlikely to produce further
downtime. With luck, this is the last you'll hear about this.
-Andrew
On 2/13/19 7:25 AM, Andrew Bogott wrote:
> We're currently experiencing a mysterious hareware failure in our
> datacenter -- three different SSDs failed overnight, two of them
> in cloudvirt1018 and one of them in cloudvirt1024. The VMs on
> 1018 are down entirely. We may move those on 1024 to another host
> shortly in order to guard against additional drive failure.
>
> There's some possibility that we will experience permanent data
> loss on cloudvirt1018, but everyone is working hard to avoid this.
>
> The following VMs are on cloudvirt1018:
>
>
> a11y | reading-web-staging
> abogott-scapserver | testlabs
> af-puppetdb01 | automation-framework
> api | openocr
> asdf | quotatest
> bastion-eqiad1-02 | bastion
> clm-test-01 | community-labs-monitoring
> compiler1002 | puppet-diffs
> cyberbot-exec-iabot-01 | cyberbot
> deployment-db03 | deployment-prep
> deployment-db04 | deployment-prep
> deployment-memc05 | deployment-prep
> deployment-pdfrender02 | deployment-prep
> deployment-sca01 | deployment-prep
> design-lsg3 | design
> eventmetrics-dev01 | eventmetrics
> fridolin | catgraph
> gtirloni-puppetmaster-01 | testlabs
> hadoop-master-3 | analytics
> ign | ign2commons
> integration-castor03 | integration
> integration-slave-docker-1017 | integration
> integration-slave-docker-1033 | integration
> integration-slave-docker-1038 | integration
> integration-slave-jessie-1003 | integration
> integration-slave-jessie-android | integration
> k8s-master-01 | general-k8s
> k8s-node-03 | general-k8s
> k8s-node-05 | general-k8s
> k8s-node-06 | general-k8s
> kdc | analytics
> labstash-jessie1 | logging
> language-mleb-legacy | language
> login-test | catgraph
> lsg-01 | design
> mathosphere | math
> mc-clusterA-1 | test-twemproxy
> mwoffliner5 | mwoffliner
> novaadminmadethis-4 | quotatest
> ntp-01 | cloudinfra
> ntp-02 | cloudinfra
> ogvjs-testing | ogvjs-integration
> phragile-pro | phragile
> planet-hotdog | planet
> pub2 | wikiapiary
> puppenmeister | planet
> puppet-compiler-v4-other | testlabs
> puppet-compiler-v4-tools | testlabs
> quarry-beta-01 | quarry
> signwriting-swis | signwriting
> signwriting-swserver | signwriting
> social-tools3 | social-tools
> striker-deploy04 | striker
> striker-puppet01 | striker
> t166878 | otrs
> togetherjs | visualeditor
> tools-sgebastion-06 | tools
> tools-sgeexec-0902 | tools
> tools-sgeexec-0903 | tools
> tools-sgewebgrid-generic-0901 | tools
> tools-sgewebgrid-lighttpd-0901 | tools
> ve-font | design
> wikibase1 | sciencesource
> wikicitevis-prod | wikicitevis
> wikifarm | pluggableauth
> women-in-red | globaleducation
>
>