Hi there,
Today 2022-11-22 at about 12:25 UTC, as part of a routine operation I
reimaged/reformated a cloudvirt hypervisor without relocating all the
virtual machines first.
The data survived the reimage, but the 32 (!) affected virtual machines
were briefly unavailable and then hard-rebooted.
All virtual machines are now ACTIVE (up and running) from the openstack
point of view, but please, let me know if you need assistance recovering
them in any way.
As of this writing we don't have any automation to ensure we only
reimage empty hypervisors, but we're working on it, to prevent this kind
of human errors in the future.
regards. (and sorry!)
(!) Affected virtual machines are:
- ID: 78782628-4f9f-4263-84fc-06e767b3bfe1
Name: mx-wiki
- ID: 1fa9f0d9-42e8-4273-bdb1-a7d49998c13f
Name: synapse01
- ID: 2382fda0-e683-4d0c-95b6-bbbf323904d9
Name: canary1048-04
- ID: 4b570277-e51f-459d-bea2-394c5ad7bc92
Name: tools-sgeexec-10-16
- ID: 66529c1b-f3a3-4ff8-b30d-785f4f274965
Name: feature-store-test
- ID: e153f69a-46a0-458a-ab50-de3d86aa861b
Name: toolsbeta-test-k8s-worker-7
- ID: c3a2d1a9-f811-4da9-afba-3a113c8ff729
Name: wbregistry-02
- ID: 2b56c575-08a5-4def-87cb-bee5bd43e4f9
Name: prod
- ID: 141ac13c-f0fa-46d3-9d2a-cede8bc854c6
Name: devtools-puppetdb1001
- ID: fdb15c24-0b41-42d6-9c4a-82afd1d9dcb9
Name: tools-sgeweblight-10-31
- ID: 56e55a31-8d32-455e-b650-b7194e71d2fd
Name: runner-1023
- ID: cb4a87e4-264e-4c8f-8197-3efff54346de
Name: runner-1022
- ID: 5b6b5733-565d-456e-a4fc-85ce669d3fd2
Name: deployment-mdb02
- ID: 75dce76d-36ad-4f9e-85e9-8a11ff6744db
Name: wikibase-product-testing-2022
- ID: 868d3dca-3e5c-4089-89a9-2c7e756c3e31
Name: toolsbeta-cumin-1
- ID: 42ac6d8a-453a-4620-b4b7-9c97994c98fb
Name: integration-agent-docker-1030
- ID: 084da652-503d-49a7-9ffa-98a0cd5335fd
Name: toolsbeta-sgeexec-10-5
- ID: f098fe82-18b6-49a9-962d-9b8f1f989b14
Name: pcc-worker1001
- ID: 8eb272dc-8006-4e93-a966-5035809324d9
Name: deployment-mx03
- ID: e67d0e4c-e07c-4d9a-8ddb-cb0bc8efa388
Name: deployment-docker-api-gateway01
- ID: b958511a-10cb-4e62-bdbb-6da5013dd62f
Name: soweego
- ID: 62045cf9-59ed-44b9-a268-1c9f171b5aae
Name: tools-package-builder-04
- ID: 0127e905-f52e-4ed4-b60d-260102a8e625
Name: pontoon-lb-02
- ID: 827bf744-262a-458b-951d-f2e9a377e075
Name: toolsbeta-test-k8s-ingress-3
- ID: 3e6c31d7-b4db-4a5f-a610-a74d0013f631
Name: pki-test01
- ID: 8893ba32-fb5c-4567-a242-b6c676978b7d
Name: deployment-urldownloader03
- ID: f72e5b18-6376-4ccd-9e59-64447759e53f
Name: deployment-deploy03
- ID: 006dea0a-a1eb-4de3-bf45-1a071ad87152
Name: kafka-test-cloud-2
- ID: e05220d7-8ca1-4d9f-a933-01a843286ea8
Name: toolsbeta-docker-imagebuilder-01
- ID: 416f445a-cad4-45c2-b32e-f17100f93eac
Name: cloud-puppetmaster-05
- ID: 4e492051-25a3-4442-b8b9-1959f42917fe
Name: tools-k8s-worker-76
- ID: df18863a-2da7-4951-aa69-936b3d889592
Name: deployment-docker-cpjobqueue01
--
Arturo Borrero Gonzalez
Senior Site Reliability Engineer
Wikimedia Cloud Services
Wikimedia Foundation