Yes, all of the things listed will affect tools as well as servers.
Affected tools will most likely get restarted and un-stuck as a
consequence of exec/k8s node reboots.
út 18. 1. 2022 v 18:00 odesílatel Andrew Bogott
Since no one expressed concerns about this, I'm going to go ahead and
roll this out tomorrow morning at 16:00 UTC. Here's what to expect:
1) If your VM mounts secondary-scratch but doesn't actually use it,
nothing much will happen
2) If your VM or tool has an open file on that volume when the
switchover happens, it will probably freeze up. I will reboot VMs
this happens to.
3) If you had files on the scratch volume before this change, they
be gone after the change. Precious files will be recoverable after
fact for a few weeks.
On 1/14/22 2:06 PM, Andrew Bogott wrote:
We are in the process of re-engineering and virtualizing the NFS
service provided to Toolforge and VMs. The transition will be rocky
and involve some service interruption... I'm still running tests to
determine exactly host much disruption will be required.
The first volume that I'd like to replace is 'scratch,' typically
mounted as /mnt/nfs/secondary-scratch. I'm seeking feedback
vital scratch uptime is to your current workflow,
it would be to lose data there.
If you have a project or tool that uses scratch, please respond
your thoughts! My preference would be to wipe out
scratch and also subject users to several
unannounced periods of
downtime, but I also don't want anyone to suffer. If you have
important/persistent data on that volume then the WMCS team will
with you to migrate that data somewhere safer,
and if you have an
important workflow that will break due to Scratch downtime then
work harder on devising a low-impact roll-out.
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
Cloud mailing list -- cloud(a)lists.wikimedia.org
Cloud mailing list --cloud(a)lists.wikimedia.org