[Cloud] [Cloud-announce] Re: Do you use /mnt/nfs/secondary-scratch? Expect downtime tomorrow.

19 Jan 2022

      This went pretty well. I had to reboot three VMs:
wcqs-beta-01.wikidata-query.eqiad1.wikimedia.cloud
maps-wmanew.maps.eqiad1.wikimedia.cloud
tools-sgeexec-0913.tools.eqiad1.wikimedia.cloud
That last one probably caused a few grid jobs to be restarted.
Please let me know if you encounter any bad behavior with this new NFS 
mount; it's a test case for future NFS migrations so I'm very interested 
in how well this one works.
-Andrew
On 1/18/22 8:59 AM, Andrew Bogott wrote:
...
Since no one expressed concerns about this, I'm going to go ahead and 
roll this out tomorrow morning at 16:00 UTC.  Here's what to expect:

If your VM mounts secondary-scratch but doesn't actually use it,

nothing much will happen
2) If your VM or tool has an open file on that volume when the 
switchover happens, it will probably freeze up.  I will reboot VMs 
that this happens to.
3) If you had files on the scratch volume before this change, they 
will be gone after the change. Precious files will be recoverable 
after the fact for a few weeks.
-Andrew
On 1/14/22 2:06 PM, Andrew Bogott wrote:
...
Hello, all!
We are in the process of re-engineering and virtualizing[0] the NFS 
service provided to Toolforge and VMs. The transition will be rocky 
and involve some service interruption... I'm still running tests to 
determine exactly host much disruption will be required.
The first volume that I'd like to replace is 'scratch,' typically 
mounted as /mnt/nfs/secondary-scratch. I'm seeking feedback about how 
vital scratch uptime is to your current workflow, and how disruptive 
it would be to lose data there.
If you have a project or tool that uses scratch, please respond with 
your thoughts! My preference would be to wipe out all existing data 
on scratch and also subject users to several unannounced periods of 
downtime, but I also don't want anyone to suffer. If you have 
important/persistent data on that volume then the WMCS team will work 
with you to migrate that data somewhere safer, and if you have an 
important workflow that will break due to Scratch downtime then I'll 
work harder on devising a low-impact roll-out.
Thank you!
-Andrew
[0] https://phabricator.wikimedia.org/T291405
_______________________________________________
Cloud-announce mailing list -- cloud-announce@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.o...

2024

2023

2022

2021

2020

2019

2018

2017

[Cloud] [Cloud-announce] Re: Do you use /mnt/nfs/secondary-scratch? Expect downtime tomorrow.