Unfortunately, the tools NFS server was affected, and some processes
got hung (see
).
We are restarting some of the toolforge nodes to get them unstuck,
we'll get things back in a fully working state soon.
On Mon, 2023-05-15 at 17:34 +0200, David Caro wrote:
Update
The restart was not as smooth as expected, and after a minute of
trying
to shut the ports down to prepare for it, the storage cluster started
having trouble and that caused some instability for the virtual
machines (some slow disk writes, some network connectivity issues,
...).
Everything should be back up and we will gather and investigate the
incident to avoid it from happening in the future.
Let us know if you are still seeing issues by opening a ticket or
pinging us on IRC:
https://wikitech.wikimedia.org/wiki/Help:Toolforge#Communication_and_support
Thanks for your patience!
On Mon, 2023-05-15 at 13:55 +0200, David Caro wrote:
Hi!
We are restarting a switch[1] today at 13:00 UTC.
We are moving all the affected VMs to different hypervisors, and we
expect no downtime, though you might experience the servers being a
bit
unresponsive when the migration finally moves the VM (a couple
seconds).
We will reply to this email once it's done.
Thanks!
[
1]https://phabricator.wikimedia.org/T316544
---
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share
in
the sum of all knowledge. That's our commitment."