The file system (Ceph) is now stable and we've turned all the VMs back
on. And, the downtime paid off! We see no evidence of data loss or
corruption.
A few VMs have been a bit fussy about coming back up, so I encourage you
to 'Hard Reboot Instance' if you are seeing bad behavior. Toolforge
should be back to normal. PAWS is still complaining but we hope to have
it stabilized shortly.
Please follow up here or on IRC if you encounter unexpected issues.
-Andrew
On 2/13/23 10:55 AM, Andrew Bogott wrote:
We are having some very concerning instability with
the cloud-vps file
system. Out of an abundance of caution I have shut off EVERYTHING in
cloud-vps to prevent rampant data corruption.
I don't expect this outage to last long but will notify when things
start up again. Very sorry for the downtime!
-Andrew