TL;DR: * c1.labsdb (labsdb1001.eqiad.wmnet) is down due to hardware issues * *.labsdb are pointing to c3.labsdb (labsdb1003.eqiad.wmnet)
The physical server behind c1.labsdb (labsdb1001.eqiad.wmnet) experienced a hard drive failure around 2017-11-01T03:30 UTC. This failure is preventing the MySQL service on that host from starting. The *.labsdb service names that were pointed at that server have been updated to point to c3.labsdb (labsdb1003.eqiad.wmnet) instead.
See https://phabricator.wikimedia.org/T179464 for more information and additional updates.
Expect slower than normal performance as all traffic is handled by a single server. Now would be a great time to update the configuration for your tools to use the new database cluster [0][1].
[0]: https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_server... [1]: https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown
Bryan
On Wed, Nov 1, 2017 at 9:42 AM, Bryan Davis bd808@wikimedia.org wrote:
TL;DR:
- c1.labsdb (labsdb1001.eqiad.wmnet) is down due to hardware issues
- *.labsdb are pointing to c3.labsdb (labsdb1003.eqiad.wmnet)
Manuel (one of our awesome DBAs) was able to get the MySQL server for c1.labsdb back up and running in read-only mode.
We do not know how much longer the failing SSD drive will survive, but until it fails again, users who have user created databases on this host can attempt to connect and archive them. If the data you have stored there is something that you can recreate, you should probably do that instead.
Please move non-reproducible data to tools.db.svc.eqiad.wmflabs. c3.labsdb (labsdb1003.eqiad.wmnet) will be shutdown on Wednesday 13 December 2017, so moving any data to c3 will gain you less than 6 weeks. You should also be working to update your tools that need to write data to a database to use tools.db.svc.eqiad.wmflabs. See https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown for more details.
Due to the failure of labsdb1001.eqiad.wmnet, the planned reboot of labsdb1003.eqiad.wmnet on Tuesday 07 November 2017 has been cancelled.
Bryan (on behalf of the Cloud Services and DBA teams)
Can you provide a list of tools/users impacted by the drive failure? Or is there a redundant drive covering for this?
Cyberpower678 English Wikipedia Account Creation Team English Wikipedia Administrator Global User Renamer
On Nov 2, 2017, at 19:48, Bryan Davis bd808@wikimedia.org wrote:
On Wed, Nov 1, 2017 at 9:42 AM, Bryan Davis bd808@wikimedia.org wrote: TL;DR:
- c1.labsdb (labsdb1001.eqiad.wmnet) is down due to hardware issues
- *.labsdb are pointing to c3.labsdb (labsdb1003.eqiad.wmnet)
Manuel (one of our awesome DBAs) was able to get the MySQL server for c1.labsdb back up and running in read-only mode.
We do not know how much longer the failing SSD drive will survive, but until it fails again, users who have user created databases on this host can attempt to connect and archive them. If the data you have stored there is something that you can recreate, you should probably do that instead.
Please move non-reproducible data to tools.db.svc.eqiad.wmflabs. c3.labsdb (labsdb1003.eqiad.wmnet) will be shutdown on Wednesday 13 December 2017, so moving any data to c3 will gain you less than 6 weeks. You should also be working to update your tools that need to write data to a database to use tools.db.svc.eqiad.wmflabs. See https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown for more details.
Due to the failure of labsdb1001.eqiad.wmnet, the planned reboot of labsdb1003.eqiad.wmnet on Tuesday 07 November 2017 has been cancelled.
Bryan (on behalf of the Cloud Services and DBA teams)
Bryan Davis Wikimedia Foundation bd808@wikimedia.org [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
On Thu, Nov 2, 2017 at 6:13 PM, Maximilian Doerr maximilian.doerr@gmail.com wrote:
Can you provide a list of tools/users impacted by the drive failure? Or is there a redundant drive covering for this?
As long as c1 stays up, https://tools.wmflabs.org/tool-db-usage/ will show the users with user-owned databases there. These users should have all also received a MassMessage spam from me on their Wikitech talk page about a week ago.
There is no drive or data redundancy for user-created tables on c1.labsdb or c3.labsdb. The tools.db.svc.eqiad.wmflabs databases however are replicated to a secondary server. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_and_Replication
Bryan
One clarification that has to made is that toolsdb and the new servers have both drive redundancy (RAID10) and host redundancy (3 servers serving 2 services with proxies in the middle for replicas; 2 servers for 1 service for toolsdb), that is one of the many reasons why we encourage the usage of the new servers and not the fragile old ones.
On Fri, Nov 3, 2017 at 2:13 AM, Bryan Davis bd808@wikimedia.org wrote:
On Thu, Nov 2, 2017 at 6:13 PM, Maximilian Doerr maximilian.doerr@gmail.com wrote:
Can you provide a list of tools/users impacted by the drive failure? Or
is there a redundant drive covering for this?
As long as c1 stays up, https://tools.wmflabs.org/tool-db-usage/ will show the users with user-owned databases there. These users should have all also received a MassMessage spam from me on their Wikitech talk page about a week ago.
There is no drive or data redundancy for user-created tables on c1.labsdb or c3.labsdb. The tools.db.svc.eqiad.wmflabs databases however are replicated to a secondary server. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/ Database#ToolsDB_Backups_and_Replication
Bryan
Bryan Davis Wikimedia Foundation bd808@wikimedia.org [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud