On Dec 16, 2020, at 9:04 AM, Brooke Storm
<bstorm(a)wikimedia.org> wrote:
This is happening in about an hour. We will be taking ToolsDB down for maintenance.
Brooke Storm
Staff SRE
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
On Dec 8, 2020, at 4:05 PM, Brooke Storm
<bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>> wrote:
In yet another effort to restore replication and preserve the redundancy of the data in
ToolsDB (user writable database in Toolforge), we need to take the database
(tools.db.svc.eqiad.wmflabs) completely offline at 1700 UTC on 16 Dec. Apps that depend on
the ToolsDB service will fail during the outage (which will take at least an hour, and we
aren’t entirely sure exactly how long—expect multiple hours). This will be much faster
than the last outage because we are doing a straight copy of the binary database files
between the servers. Details of this mess and efforts to restore the replication service
can be found at
https://phabricator.wikimedia.org/T266587
<https://phabricator.wikimedia.org/T266587>
If we succeed in producing a viable copy of the database on another system, we will also
perform an upgrade on the hypervisor it is on before closing the maintenance period. That
should be an additional hour or so.
We appreciate your patience with this process. It is very important that we establish a
second copy of this database, especially in light of recent crashes
(
https://phabricator.wikimedia.org/T253738
<https://phabricator.wikimedia.org/T253738>).
Brooke Storm
Staff SRE
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_