I have set ToolsDB to read-write again. It should be operating as normal now. I am continuing to work on the rebuild of the replica and am hoping that the replica will be able to catch up once the data import finishes. Thank you for your patience while we try to get this replicating again.

Brooke Storm
Staff SRE
Wikimedia Cloud Services
bstorm@wikimedia.org
IRC: bstorm

On Nov 11, 2020, at 3:23 PM, Brooke Storm <bstorm@wikimedia.org> wrote:

Update: I don’t think it is going to be done in 10 minutes. I’m surprised, but the process is still running. I still believe it will complete today.

On Nov 11, 2020, at 8:31 AM, Brooke Storm <bstorm@wikimedia.org> wrote:

Update:
ToolsDB remains in read-only mode while the data loads on the replica in order to minimize the possibility that this won’t work.  All steps have been successful so far, but the size and heterogeneity of this database service has made every step take large amounts of time for the system to complete. The last setup step is running now and should complete during my day today (US MST). Based on how long this took yesterday to export the data, my worst-case scenario estimate for restoring services is at around 22:30 UTC today.

We are tracking work at  https://phabricator.wikimedia.org/T266587 

On Nov 10, 2020, at 8:51 AM, Brooke Storm <bstorm@wikimedia.org> wrote:

This will be happening in around 10 minutes. ToolsDB will be read-only until we can get a consistent dump to rebuild replication.

Brooke Storm
Staff SRE
Wikimedia Cloud Services
IRC: bstorm

On Nov 6, 2020, at 12:12 PM, Brooke Storm <bstorm@wikimedia.org> wrote:

The ToolsDB service suffered a breakage in replication on 2020-10-27. WMCS has tried to restore replication of data, but that has been unsuccessful so far including doing a dump to rebuild replication without downtime.
At this point, we have a new server waiting to become the replica, but to start the replication process, we need to set the database to read-only for a full dump. This could easily take more than an hour. During that entire time, the database will be read-only.

We will begin at 1600 UTC and finish when it is done. The database is quite large, but, with it in read-only mode, I hope the backup will not take terribly long.

Please see https://phabricator.wikimedia.org/T266587 for additional information. 

Brooke Storm
Staff SRE
Wikimedia Cloud Services
IRC: bstorm