Upcoming WMCS network outages: Tuesday May 15th - Wikitech-l

Andrew Bogott

14 May 14 May

5:02 p.m.

Reminder: this outage is happening tomorrow. On 5/2/18 10:22 AM, Andrew Bogott wrote:

As part of some long-deferred routine maintenance, we need to update (and, in one case, physically move) the network servers that handle all traffic between WMCS instances. During each change, all WMCS network traffic (including network access to all tools and VMs) will be interrupted for several minutes. The first outage will be: Tuesday, May 15 at 13:00 UTC The second outage will be three hours later: Tuesday, May 15 16:00 UTC In each case outages should last no more than ten to fifteen minutes. More details about this move can be found at https://phabricator.wikimedia.org/T193579 . -Andrew

Reply

Andrew Bogott

1:33 p.m.

The first of these tasks is done and the network is back up and running. The outage lasted a bit less than 10 minutes. There will be another similar outage in a few hours. -Andrew On 5/2/18 10:22 AM, Andrew Bogott wrote:

...

As part of some long-deferred routine maintenance, we need to update (and, in one case, physically move) the network servers that handle all traffic between WMCS instances. During each change, all WMCS network traffic (including network access to all tools and VMs) will be interrupted for several minutes. The first outage will be: Tuesday, May 15 at 13:00 UTC The second outage will be three hours later: Tuesday, May 15 16:00 UTC In each case outages should last no more than ten to fifteen minutes. More details about this move can be found at https://phabricator.wikimedia.org/T193579 . -Andrew

Reply

Andrew Bogott

5:24 p.m.

We're leaving things in this in-between state (running network services through our backup host, labnet1002) for the duration. All services should be running as normal until further notice. Once we iron out the current unexpected issue there will be another interruption; I'll provide as much warning about that as I can. It's unlikely to be today, in any case. Sorry for any inconvenience caused! -Andrew On 5/15/18 12:04 PM, Andrew Bogott wrote:

...

Things are back up and running for the moment. The last switch-over went poorly so we haven't actually reached our goals yet; there may be another interruption yet coming up. -A On 5/15/18 8:33 AM, Andrew Bogott wrote:

The first of these tasks is done and the network is back up and running. The outage lasted a bit less than 10 minutes. There will be another similar outage in a few hours. -Andrew On 5/2/18 10:22 AM, Andrew Bogott wrote:

As part of some long-deferred routine maintenance, we need to update (and, in one case, physically move) the network servers that handle all traffic between WMCS instances. During each change, all WMCS network traffic (including network access to all tools and VMs) will be interrupted for several minutes. The first outage will be: Tuesday, May 15 at 13:00 UTC The second outage will be three hours later: Tuesday, May 15 16:00 UTC In each case outages should last no more than ten to fifteen minutes. More details about this move can be found at https://phabricator.wikimedia.org/T193579 . -Andrew

Reply

Andrew Bogott

16 May 16 May

3:34 p.m.

New subject: Upcoming WMCS network outages: Tuesday May 15th (DONE)

We had a couple of minutes of downtime just now, and everything is back up. This went a lot better today; this should be the last of these network interruptions for a while. -Andrew On 5/15/18 3:31 PM, Andrew Bogott wrote:

...

The next step in this is scheduled for tomorrow at at 15:00 UTC, 8:00AM in SF. Again, all network service will be interrupted for 5-10 minutes. Sorry for all the emails! With luck there will only be one more. -Andrew On 5/15/18 12:24 PM, Andrew Bogott wrote:

We're leaving things in this in-between state (running network services through our backup host, labnet1002) for the duration. All services should be running as normal until further notice. Once we iron out the current unexpected issue there will be another interruption; I'll provide as much warning about that as I can. It's unlikely to be today, in any case. Sorry for any inconvenience caused! -Andrew On 5/15/18 12:04 PM, Andrew Bogott wrote:

Things are back up and running for the moment. The last switch-over went poorly so we haven't actually reached our goals yet; there may be another interruption yet coming up. -A On 5/15/18 8:33 AM, Andrew Bogott wrote:

The first of these tasks is done and the network is back up and running. The outage lasted a bit less than 10 minutes. There will be another similar outage in a few hours. -Andrew On 5/2/18 10:22 AM, Andrew Bogott wrote: > > As part of some long-deferred routine maintenance, we need to > update (and, in one case, physically move) the network servers > that handle all traffic between WMCS instances. During each > change, all WMCS network traffic (including network access to all > tools and VMs) will be interrupted for several minutes. > > The first outage will be: > > Tuesday, May 15 at 13:00 UTC > > The second outage will be three hours later: > > Tuesday, May 15 16:00 UTC > > In each case outages should last no more than ten to fifteen minutes. > > More details about this move can be found at > https://phabricator.wikimedia.org/T193579 . > > -Andrew > >

Reply