The file system (Ceph) is now stable and we've turned all the VMs back on. And, the downtime paid off! We see no evidence of data loss or corruption.
A few VMs have been a bit fussy about coming back up, so I encourage you to 'Hard Reboot Instance' if you are seeing bad behavior. Toolforge should be back to normal. PAWS is still complaining but we hope to have it stabilized shortly.
Please follow up here or on IRC if you encounter unexpected issues.
-Andrew
On 2/13/23 10:55 AM, Andrew Bogott wrote:
We are having some very concerning instability with the cloud-vps file system. Out of an abundance of caution I have shut off EVERYTHING in cloud-vps to prevent rampant data corruption.
I don't expect this outage to last long but will notify when things start up again. Very sorry for the downtime!
-Andrew