In yet another effort to restore replication and preserve the redundancy of the data in ToolsDB (user writable database in Toolforge), we need to take the database (tools.db.svc.eqiad.wmflabs) completely offline at 1700 UTC on 16 Dec. Apps that depend on the ToolsDB service will fail during the outage (which will take at least an hour, and we aren’t entirely sure exactly how long—expect multiple hours). This will be much faster than the last outage because we are doing a straight copy of the binary database files between the servers. Details of this mess and efforts to restore the replication service can be found at https://phabricator.wikimedia.org/T266587 https://phabricator.wikimedia.org/T266587
If we succeed in producing a viable copy of the database on another system, we will also perform an upgrade on the hypervisor it is on before closing the maintenance period. That should be an additional hour or so.
We appreciate your patience with this process. It is very important that we establish a second copy of this database, especially in light of recent crashes (https://phabricator.wikimedia.org/T253738 https://phabricator.wikimedia.org/T253738).
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org IRC: bstorm_
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
This is happening in about an hour. We will be taking ToolsDB down for maintenance.
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org IRC: bstorm_
On Dec 8, 2020, at 4:05 PM, Brooke Storm bstorm@wikimedia.org wrote:
In yet another effort to restore replication and preserve the redundancy of the data in ToolsDB (user writable database in Toolforge), we need to take the database (tools.db.svc.eqiad.wmflabs) completely offline at 1700 UTC on 16 Dec. Apps that depend on the ToolsDB service will fail during the outage (which will take at least an hour, and we aren’t entirely sure exactly how long—expect multiple hours). This will be much faster than the last outage because we are doing a straight copy of the binary database files between the servers. Details of this mess and efforts to restore the replication service can be found at https://phabricator.wikimedia.org/T266587 https://phabricator.wikimedia.org/T266587
If we succeed in producing a viable copy of the database on another system, we will also perform an upgrade on the hypervisor it is on before closing the maintenance period. That should be an additional hour or so.
We appreciate your patience with this process. It is very important that we establish a second copy of this database, especially in light of recent crashes (https://phabricator.wikimedia.org/T253738 https://phabricator.wikimedia.org/T253738).
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Toolsdb is finally back AND replicated —Brooke
On Dec 16, 2020, at 9:04 AM, Brooke Storm bstorm@wikimedia.org wrote:
This is happening in about an hour. We will be taking ToolsDB down for maintenance.
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Dec 8, 2020, at 4:05 PM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
In yet another effort to restore replication and preserve the redundancy of the data in ToolsDB (user writable database in Toolforge), we need to take the database (tools.db.svc.eqiad.wmflabs) completely offline at 1700 UTC on 16 Dec. Apps that depend on the ToolsDB service will fail during the outage (which will take at least an hour, and we aren’t entirely sure exactly how long—expect multiple hours). This will be much faster than the last outage because we are doing a straight copy of the binary database files between the servers. Details of this mess and efforts to restore the replication service can be found at https://phabricator.wikimedia.org/T266587 https://phabricator.wikimedia.org/T266587
If we succeed in producing a viable copy of the database on another system, we will also perform an upgrade on the hypervisor it is on before closing the maintenance period. That should be an additional hour or so.
We appreciate your patience with this process. It is very important that we establish a second copy of this database, especially in light of recent crashes (https://phabricator.wikimedia.org/T253738 https://phabricator.wikimedia.org/T253738).
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce