On Apr 4, 2019 11:58 PM, Brooke Storm <bstorm@wikimedia.org> wrote:

At this point, rsync and database services should be fully restored on osmdb.eqiad.wmnet, which now points to osm.db.svc.eqiad.wmflabs (the cloud internal address of the instance it now runs on).
Work will continue to get the replica in cloud up and running as expected, but that should not impact users of the server at all.

Apologies for the less smooth transition. I’ll add docs and/or technical fixes to make future failover more smooth.

Brooke Storm

Operations Engineer

Wikimedia Cloud Services

bstorm@wikimedia.org

IRC: bstorm_

On Apr 4, 2019, at 10:48 AM, Brooke Storm <bstorm@wikimedia.org> wrote:

Sadly, there was a permissions issue that caused a brief crash during the promotion of the server to master. The server should be up in read-write mode now on the new server.

Adjusting things to get the rsync jobs moved and working as well at this point.

Brooke Storm

Operations Engineer

Wikimedia Cloud Services

bstorm@wikimedia.org

IRC: bstorm_

On Apr 4, 2019, at 10:11 AM, Brooke Storm <bstorm@wikimedia.org> wrote:

This is starting now.

Brooke Storm

Operations Engineer

Wikimedia Cloud Services

bstorm@wikimedia.org

IRC: bstorm_

On Apr 1, 2019, at 1:18 PM, Brooke Storm <bstorm@wikimedia.org> wrote:

The OSM postgresql database service, usually accessed via osmdb.eqiad.wmnet is moving to a new server. Currently the server is a read replica of the primary database, and should be accessible via the DNS alias of osm.db.svc.eqiad.wmflabs.

As detailed here https://phabricator.wikimedia.org/T219652, osmdb.eqiad.wmnet will be changed to point at the osm.db.svc.eqiad.wmflabs. For a brief time that will make those tables that aren’t always read-only also read-only while DNS updates. Then the replica will be promoted to the master, and the rest of the steps should not cause any impact.

Brooke Storm

Operations Engineer

Wikimedia Cloud Services

bstorm@wikimedia.org

IRC: bstorm_