There will
be two major Toolforge
outages this coming week. Each outage will cause tool downtime
and may require manual restarts afterwards.
The first
outage is an NFS migration [0] and will take place on Monday,
beginning at around 0:00 UTC and lasting well into the day,
possibly as late as 19:00 UTC. During this long period, Toolforge NFS
will be read-only. This will cause most tools (for example,
anything that writes a log file) to fail.
The second
outage will be a database migration [1] and will take place on
Thursday at 17:00UTC. During this window ToolsDB will be
read-only. This migration should take about an hour but
unexpected side-effects may extend the downtime.
We try very
hard to avoid outages of this magnitude, but at this point we
need to choose downtime over the increasing risk of data loss.
More details
can be found below.
[0] NFS
Outage and system reboots Monday: The existing toolforge NFS
server is running on aging hardware and lacks a straightforward
path for maintenance or upgrading. To improve this we are moving
NFS to a cinder+VM platform which should support easier
upgrades, migrations, and expansions in the future. In order to
maintain data integrity during the migration, the old server
will need to be made read-only while the last set of file
changes is synchronized with the new server. Because the NFS
service is actively used, it will take many hours to complete
the final sync.
To ensure
stable mounts of the new server, every node in Toolforge
will be rebooted as part of this migration. That means that even
tools which do not use NFS will be affected, although most tools
should restart gracefully.
[1] DB
outage Thursday: As part of
the ongoing effort to upgrade
user-created Toolforge
databases, we will
migrate ToolsDB to a new VM that will have a more recent version
of Debian and MariaDB and will use a more resilient storage
solution.
The
new VM is ready, and we plan to point all tools to use it on Apr, 6 2023 at 17:00 UTC.
This
will involve about 1 hour of read-only time
for the database. Any existing database connection will be
terminated, and if your tool does not reconnect automatically
you might have to restart it manually.
An email
will be sent shortly before starting the migration, and when
it's finished.
Please
also make sure your tool is connecting to the database using the
canonical hostname tools.db.svc.wikimedia.cloud
and not any other hostname or IP address.