With a redundant power supply upgrade going on this week in the datacenter that could affect the VM that Toolsdb runs on, we anticipate a brief outage Thursday 10/24 @11am UTC of the mysql service to protect data in case anything goes wrong. This may require a restart of a tool to reconnect to the database. We do not anticipate any worse disruptions, but if there is any disruption beyond what is planned, a failover may be necessary, which will not include the non-replicated tables mentioned here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_... https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_and_Replication
The maintenance requiring this notice and action is detailed here https://phabricator.wikimedia.org/T227540 https://phabricator.wikimedia.org/T227540. The VM resides on the cloudvirt1019 hypervisor, which is why it is in scope.
We sincerely apologize for the short notice.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
On 10/21/19 9:49 PM, Brooke Storm wrote:
With a redundant power supply upgrade going on this week in the datacenter that could affect the VM that Toolsdb runs on, we anticipate a brief outage Thursday 10/24 @11am UTC of the mysql service to protect data in case anything goes wrong. This may require a restart of a tool to reconnect to the database. We do not anticipate any worse disruptions, but if there is any disruption beyond what is planned, a failover may be necessary, which will not include the non-replicated tables mentioned here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_...
The maintenance requiring this notice and action is detailed here https://phabricator.wikimedia.org/T227540. The VM resides on the cloudvirt1019 hypervisor, which is why it is in scope.
We sincerely apologize for the short notice.
Reminder, this is happening in a few minutes!
An entirely surprising side-effect of this maintenance is causing chronic database instability. We're working to resolve this but in the meantime the tools database server is likely to be up and down several times. We'll update once things are stable again.
Sorry for the (ongoing) interruption!
-Andrew + wmcs team
On 10/21/19 2:49 PM, Brooke Storm wrote:
With a redundant power supply upgrade going on this week in the datacenter that could affect the VM that Toolsdb runs on, we anticipate a brief outage Thursday 10/24 @11am UTC of the mysql service to protect data in case anything goes wrong. This may require a restart of a tool to reconnect to the database. We do not anticipate any worse disruptions, but if there is any disruption beyond what is planned, a failover may be necessary, which will not include the non-replicated tables mentioned here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_...
The maintenance requiring this notice and action is detailed here https://phabricator.wikimedia.org/T227540. The VM resides on the cloudvirt1019 hypervisor, which is why it is in scope.
We sincerely apologize for the short notice.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Thanks to last-minute intervention by Jaime Crespo, toolsdb is back to working as normal. Some context can be found at https://phabricator.wikimedia.org/T236384
-Andrew + wmcs team
On 10/24/19 10:23 AM, Andrew Bogott wrote:
An entirely surprising side-effect of this maintenance is causing chronic database instability. We're working to resolve this but in the meantime the tools database server is likely to be up and down several times. We'll update once things are stable again.
Sorry for the (ongoing) interruption!
-Andrew + wmcs team
On 10/21/19 2:49 PM, Brooke Storm wrote:
With a redundant power supply upgrade going on this week in the datacenter that could affect the VM that Toolsdb runs on, we anticipate a brief outage Thursday 10/24 @11am UTC of the mysql service to protect data in case anything goes wrong. This may require a restart of a tool to reconnect to the database. We do not anticipate any worse disruptions, but if there is any disruption beyond what is planned, a failover may be necessary, which will not include the non-replicated tables mentioned here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_...
The maintenance requiring this notice and action is detailed here https://phabricator.wikimedia.org/T227540. The VM resides on the cloudvirt1019 hypervisor, which is why it is in scope.
We sincerely apologize for the short notice.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerlylabs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Two-plus hours later I’m still seeing frequent DB errors.
Sent from my iPhone
On Oct 24, 2019, at 9:04 AM, Andrew Bogott abogott@wikimedia.org wrote:
Thanks to last-minute intervention by Jaime Crespo, toolsdb is back to working as normal. Some context can be found at https://phabricator.wikimedia.org/T236384
-Andrew + wmcs team
On 10/24/19 10:23 AM, Andrew Bogott wrote: An entirely surprising side-effect of this maintenance is causing chronic database instability. We're working to resolve this but in the meantime the tools database server is likely to be up and down several times. We'll update once things are stable again.
Sorry for the (ongoing) interruption!
-Andrew + wmcs team
On 10/21/19 2:49 PM, Brooke Storm wrote: With a redundant power supply upgrade going on this week in the datacenter that could affect the VM that Toolsdb runs on, we anticipate a brief outage Thursday 10/24 @11am UTC of the mysql service to protect data in case anything goes wrong. This may require a restart of a tool to reconnect to the database. We do not anticipate any worse disruptions, but if there is any disruption beyond what is planned, a failover may be necessary, which will not include the non-replicated tables mentioned here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_...
The maintenance requiring this notice and action is detailed here https://phabricator.wikimedia.org/T227540. The VM resides on the cloudvirt1019 hypervisor, which is why it is in scope.
We sincerely apologize for the short notice.
Brooke Storm Senior SRE Wikimedia Cloud Services bstorm@wikimedia.org IRC: bstorm_
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
On Thu, Oct 24, 2019 at 12:24 PM Russell Blau russblau@imapmail.org wrote:
Two-plus hours later I’m still seeing frequent DB errors.
You are very correct. The 'all clear' from Andrew seemed like the right thing at the time, but we are still having stability issues with the ToolsDB service.
We have an active tracking task at https://phabricator.wikimedia.org/T236420 that we are adding more information to for those who are hoping for more details. Multiple engineers on multiple contents are currently working to restore ToolsDB to normal function, but we do not currently have an estimate on when things will be stable.
Bryan