[Labs-l] All Clear: Labs network work Wednesday 2015-08-19 21:00 UTC

Andrew Bogott abogott at wikimedia.org
Wed Aug 19 22:04:26 UTC 2015


This is done and everything should be back to normal.  Let me know if 
you encounter irregularities!

We will probably do one more firedrill in a week or two to verify that 
we have a switch-over process that's faster than today's was.  I'll warn 
appropriately if that happens.

-Andrew


On 8/19/15 3:52 PM, Andrew Bogott wrote:
> Reminder -- this will start in ten minutes.  Labs networks may stutter 
> or be temporarily unavailable during this work.
>
> -Andrew
>
>
> On 8/13/15 4:59 PM, Andrew Bogott wrote:
>> Next Wednesday Chase and I are going to have a go at updating our 
>> labs network node.  There may be intermittent network interruptions 
>> in communication both between labs instances and between labs and the 
>> outside world.
>>     No action should be required on the part of labs users unless you 
>> have jobs that will time out and die due to network failures. In any 
>> case, I will send an 'all clear' message at the end of the upgrade 
>> with details about what, if any, downtime ensued.
>>
>> == technical background ==
>>
>>     Labs is currently running with a single nova-network node, 
>> labnet1001.  It's proved fairly reliable, but labnet1001 is running 
>> an old OS (Ubuntu Precise) and is a single point of failure.
>>     During the maintenance window we will bring up a new nova-network 
>> node on labnet1002, running Ubuntu Trusty, and then switch existing 
>> labs traffic to the new node.  It may be possible to do this with 
>> minimal network interruption, but there are a few minor unknowns in 
>> our plan.  In any case, this migration will serve as a 
>> proof-of-concept for possible future emergency failovers.
>>     Presuming all goes well and things land stably on labnet1002, 
>> labnet1001 will be upgraded and maintained as a hot spare.
>>
>> -Andrew
>>
>




More information about the Labs-l mailing list