Toolsdb is finally back AND replicated
—Brooke

On Dec 16, 2020, at 9:04 AM, Brooke Storm <bstorm@wikimedia.org> wrote:

This is happening in about an hour. We will be taking ToolsDB down for maintenance.

Brooke Storm
Staff SRE
Wikimedia Cloud Services
bstorm@wikimedia.org
IRC: bstorm_


On Dec 8, 2020, at 4:05 PM, Brooke Storm <bstorm@wikimedia.org> wrote:

In yet another effort to restore replication and preserve the redundancy of the data in ToolsDB (user writable database in Toolforge), we need to take the database (tools.db.svc.eqiad.wmflabs) completely offline at 1700 UTC on 16 Dec. Apps that depend on the ToolsDB service will fail during the outage (which will take at least an hour, and we aren’t entirely sure exactly how long—expect multiple hours). This will be much faster than the last outage because we are doing a straight copy of the binary database files between the servers. Details of this mess and efforts to restore the replication service can be found at https://phabricator.wikimedia.org/T266587 

If we succeed in producing a viable copy of the database on another system, we will also perform an upgrade on the hypervisor it is on before closing the maintenance period.  That should be an additional hour or so.

We appreciate your patience with this process. It is very important that we establish a second copy of this database, especially in light of recent crashes (https://phabricator.wikimedia.org/T253738).

Brooke Storm
Staff SRE
Wikimedia Cloud Services
bstorm@wikimedia.org
IRC: bstorm_