[Labs-l] All Clear: Labs network work Wednesday 2015-08-19 21:00 UTC
Andrew Bogott
abogott at wikimedia.org
Wed Aug 19 22:04:26 UTC 2015
This is done and everything should be back to normal. Let me know if
you encounter irregularities!
We will probably do one more firedrill in a week or two to verify that
we have a switch-over process that's faster than today's was. I'll warn
appropriately if that happens.
-Andrew
On 8/19/15 3:52 PM, Andrew Bogott wrote:
> Reminder -- this will start in ten minutes. Labs networks may stutter
> or be temporarily unavailable during this work.
>
> -Andrew
>
>
> On 8/13/15 4:59 PM, Andrew Bogott wrote:
>> Next Wednesday Chase and I are going to have a go at updating our
>> labs network node. There may be intermittent network interruptions
>> in communication both between labs instances and between labs and the
>> outside world.
>> No action should be required on the part of labs users unless you
>> have jobs that will time out and die due to network failures. In any
>> case, I will send an 'all clear' message at the end of the upgrade
>> with details about what, if any, downtime ensued.
>>
>> == technical background ==
>>
>> Labs is currently running with a single nova-network node,
>> labnet1001. It's proved fairly reliable, but labnet1001 is running
>> an old OS (Ubuntu Precise) and is a single point of failure.
>> During the maintenance window we will bring up a new nova-network
>> node on labnet1002, running Ubuntu Trusty, and then switch existing
>> labs traffic to the new node. It may be possible to do this with
>> minimal network interruption, but there are a few minor unknowns in
>> our plan. In any case, this migration will serve as a
>> proof-of-concept for possible future emergency failovers.
>> Presuming all goes well and things land stably on labnet1002,
>> labnet1001 will be upgraded and maintained as a hot spare.
>>
>> -Andrew
>>
>
More information about the Labs-l
mailing list