Reminder: The first of these outages
will start in about 30 minutes. Toolforge NFS will be read-only
for as long as 18-19 hours.
There will
be two major Toolforge
outages this coming week. Each outage will cause tool downtime
and may require manual restarts afterwards.
The first
outage is an NFS migration [0] and will take place on Monday,
beginning at around 0:00 UTC and lasting well into the day,
possibly as late as 19:00 UTC. During this long period, Toolforge
NFS will be read-only. This will cause most tools (for
example, anything that writes a log file) to fail.
The second
outage will be a database migration [1] and will take place on
Thursday at 17:00UTC. During this window ToolsDB will be
read-only. This migration should take about an hour but
unexpected side-effects may extend the downtime.
We try
very hard to avoid outages of this magnitude, but at this
point we need to choose downtime over the increasing risk of
data loss.
More
details can be found below.
[0] NFS
Outage and system reboots Monday: The existing toolforge NFS
server is running on aging hardware and lacks a
straightforward path for maintenance or upgrading. To improve
this we are moving NFS to a cinder+VM platform which should
support easier upgrades, migrations, and expansions in the
future. In order to maintain data integrity during the
migration, the old server will need to be made read-only while
the last set of file changes is synchronized with the new
server. Because the NFS service is actively used, it will take
many hours to complete the final sync.
To ensure
stable mounts of the new server, every node in Toolforge
will be rebooted as part of this migration. That means that
even tools which do not use NFS will be affected, although
most tools should restart gracefully.
[1] DB
outage Thursday: As part of
the ongoing effort to
upgrade user-created Toolforge
databases, we will
migrate ToolsDB to a new VM that will have a more recent
version of Debian and MariaDB and will use a more resilient
storage solution.
The new
VM is ready, and we plan to point all tools to use it on Apr, 6 2023 at 17:00 UTC.
This
will involve about 1 hour of read-only time
for the database. Any existing database connection will be
terminated, and if your tool does not reconnect automatically
you might have to restart it manually.
An email
will be sent shortly before starting the migration, and when
it's finished.
Please
also make sure your tool is connecting to the database using
the canonical hostname tools.db.svc.wikimedia.cloud
and not any other hostname or IP address.