Since no one expressed concerns about this, I'm going to go ahead and
roll this out tomorrow morning at 16:00 UTC. Here's what to expect:
1) If your VM mounts secondary-scratch but doesn't actually use it,
nothing much will happen
2) If your VM or tool has an open file on that volume when the
switchover happens, it will probably freeze up. I will reboot VMs that
this happens to.
3) If you had files on the scratch volume before this change, they will
be gone after the change. Precious files will be recoverable after the
fact for a few weeks.
-Andrew
On 1/14/22 2:06 PM, Andrew Bogott wrote:
Hello, all!
We are in the process of re-engineering and virtualizing[0] the NFS
service provided to Toolforge and VMs. The transition will be rocky
and involve some service interruption... I'm still running tests to
determine exactly host much disruption will be required.
The first volume that I'd like to replace is 'scratch,' typically
mounted as /mnt/nfs/secondary-scratch. I'm seeking feedback about how
vital scratch uptime is to your current workflow, and how disruptive
it would be to lose data there.
If you have a project or tool that uses scratch, please respond with
your thoughts! My preference would be to wipe out all existing data on
scratch and also subject users to several unannounced periods of
downtime, but I also don't want anyone to suffer. If you have
important/persistent data on that volume then the WMCS team will work
with you to migrate that data somewhere safer, and if you have an
important workflow that will break due to Scratch downtime then I'll
work harder on devising a low-impact roll-out.
Thank you!
-Andrew
[0]
https://phabricator.wikimedia.org/T291405
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…