I will be upgrading the cloud-vps openstack install on Monday afternoon
my time (beginning around 18:00 UTC). Here's what to expect:
- Intermittent Horizon and API downtime (maybe an hour or two total)
- Inability to schedule new VMs (also for an hour or two)
- Some mild Horizon dashboard changes as I'll also be upgrading the
dashboards to version 'Zen'.
Toolforge users will be unaffected by this outage. Existing, running
services and VMs on cloud-vps should also be unaffected.
-Andrew + the WMCS team
Hi there!
On 2022-11-28 and 2022-11-29 there has been some misleading emails being
sent: you may have receive one (or more) emails about puppet failures on
your Cloud VPS virtual machine.
Moreover, such emails were a bit contradictory, with messages like
"No failed resources", and "No exceptions happened".
There was a problem in the way the puppet errors were calculated that
has been now fixed [0].
This does not affect Toolforge.
sorry for the noise,
regards.
[0] https://gerrit.wikimedia.org/r/c/operations/puppet/+/861805/
--
Arturo Borrero Gonzalez
Senior Site Reliability Engineer
Wikimedia Cloud Services
Wikimedia Foundation
Hi there,
Today 2022-11-22 at about 12:25 UTC, as part of a routine operation I
reimaged/reformated a cloudvirt hypervisor without relocating all the
virtual machines first.
The data survived the reimage, but the 32 (!) affected virtual machines
were briefly unavailable and then hard-rebooted.
All virtual machines are now ACTIVE (up and running) from the openstack
point of view, but please, let me know if you need assistance recovering
them in any way.
As of this writing we don't have any automation to ensure we only
reimage empty hypervisors, but we're working on it, to prevent this kind
of human errors in the future.
regards. (and sorry!)
(!) Affected virtual machines are:
- ID: 78782628-4f9f-4263-84fc-06e767b3bfe1
Name: mx-wiki
- ID: 1fa9f0d9-42e8-4273-bdb1-a7d49998c13f
Name: synapse01
- ID: 2382fda0-e683-4d0c-95b6-bbbf323904d9
Name: canary1048-04
- ID: 4b570277-e51f-459d-bea2-394c5ad7bc92
Name: tools-sgeexec-10-16
- ID: 66529c1b-f3a3-4ff8-b30d-785f4f274965
Name: feature-store-test
- ID: e153f69a-46a0-458a-ab50-de3d86aa861b
Name: toolsbeta-test-k8s-worker-7
- ID: c3a2d1a9-f811-4da9-afba-3a113c8ff729
Name: wbregistry-02
- ID: 2b56c575-08a5-4def-87cb-bee5bd43e4f9
Name: prod
- ID: 141ac13c-f0fa-46d3-9d2a-cede8bc854c6
Name: devtools-puppetdb1001
- ID: fdb15c24-0b41-42d6-9c4a-82afd1d9dcb9
Name: tools-sgeweblight-10-31
- ID: 56e55a31-8d32-455e-b650-b7194e71d2fd
Name: runner-1023
- ID: cb4a87e4-264e-4c8f-8197-3efff54346de
Name: runner-1022
- ID: 5b6b5733-565d-456e-a4fc-85ce669d3fd2
Name: deployment-mdb02
- ID: 75dce76d-36ad-4f9e-85e9-8a11ff6744db
Name: wikibase-product-testing-2022
- ID: 868d3dca-3e5c-4089-89a9-2c7e756c3e31
Name: toolsbeta-cumin-1
- ID: 42ac6d8a-453a-4620-b4b7-9c97994c98fb
Name: integration-agent-docker-1030
- ID: 084da652-503d-49a7-9ffa-98a0cd5335fd
Name: toolsbeta-sgeexec-10-5
- ID: f098fe82-18b6-49a9-962d-9b8f1f989b14
Name: pcc-worker1001
- ID: 8eb272dc-8006-4e93-a966-5035809324d9
Name: deployment-mx03
- ID: e67d0e4c-e07c-4d9a-8ddb-cb0bc8efa388
Name: deployment-docker-api-gateway01
- ID: b958511a-10cb-4e62-bdbb-6da5013dd62f
Name: soweego
- ID: 62045cf9-59ed-44b9-a268-1c9f171b5aae
Name: tools-package-builder-04
- ID: 0127e905-f52e-4ed4-b60d-260102a8e625
Name: pontoon-lb-02
- ID: 827bf744-262a-458b-951d-f2e9a377e075
Name: toolsbeta-test-k8s-ingress-3
- ID: 3e6c31d7-b4db-4a5f-a610-a74d0013f631
Name: pki-test01
- ID: 8893ba32-fb5c-4567-a242-b6c676978b7d
Name: deployment-urldownloader03
- ID: f72e5b18-6376-4ccd-9e59-64447759e53f
Name: deployment-deploy03
- ID: 006dea0a-a1eb-4de3-bf45-1a071ad87152
Name: kafka-test-cloud-2
- ID: e05220d7-8ca1-4d9f-a933-01a843286ea8
Name: toolsbeta-docker-imagebuilder-01
- ID: 416f445a-cad4-45c2-b32e-f17100f93eac
Name: cloud-puppetmaster-05
- ID: 4e492051-25a3-4442-b8b9-1959f42917fe
Name: tools-k8s-worker-76
- ID: df18863a-2da7-4951-aa69-936b3d889592
Name: deployment-docker-cpjobqueue01
--
Arturo Borrero Gonzalez
Senior Site Reliability Engineer
Wikimedia Cloud Services
Wikimedia Foundation
On the week of November 28 the following paths in PAWS will no longer exist:
/public/dumps/incr
/public/dumps/pagecounts-all-sites
/public/dumps/pagecounts-raw
/public/dumps/pageviews
All of these are links to the same directories under:
/public/dumps/public/other/
Please update any scripts to use the latter, full, path before this time.
https://github.com/toolforge/paws/pull/225
--
*Vivian Rook (They/Them)*
Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>