I have set ToolsDB to read-write again. It should be operating as normal now. I am continuing to work on the rebuild of the replica and am hoping that the replica will be able to catch up once the data import finishes. Thank you for your patience while we try to get this replicating again.
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm
On Nov 11, 2020, at 3:23 PM, Brooke Storm bstorm@wikimedia.org wrote:
Update: I don’t think it is going to be done in 10 minutes. I’m surprised, but the process is still running. I still believe it will complete today.
On Nov 11, 2020, at 8:31 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
Update: ToolsDB remains in read-only mode while the data loads on the replica in order to minimize the possibility that this won’t work. All steps have been successful so far, but the size and heterogeneity of this database service has made every step take large amounts of time for the system to complete. The last setup step is running now and should complete during my day today (US MST). Based on how long this took yesterday to export the data, my worst-case scenario estimate for restoring services is at around 22:30 UTC today.
We are tracking work at https://phabricator.wikimedia.org/T266587 https://phabricator.wikimedia.org/T266587
On Nov 10, 2020, at 8:51 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
This will be happening in around 10 minutes. ToolsDB will be read-only until we can get a consistent dump to rebuild replication.
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm
On Nov 6, 2020, at 12:12 PM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
The ToolsDB service suffered a breakage in replication on 2020-10-27. WMCS has tried to restore replication of data, but that has been unsuccessful so far including doing a dump to rebuild replication without downtime. At this point, we have a new server waiting to become the replica, but to start the replication process, we need to set the database to read-only for a full dump. This could easily take more than an hour. During that entire time, the database will be read-only.
We will begin at 1600 UTC and finish when it is done. The database is quite large, but, with it in read-only mode, I hope the backup will not take terribly long.
Please see https://phabricator.wikimedia.org/T266587 https://phabricator.wikimedia.org/T266587 for additional information.
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm