[Labs-announce] IMPORTANT: Security reboots everywhere tomorrow (DONE)

Andrew Bogott abogott at wikimedia.org
Wed Jun 21 20:19:46 UTC 2017


Reboots are done for today.  Please let me know if you find things that 
are broken as a result.

Thank you for your patience!

-Andrew


On 6/21/17 12:48 PM, Andrew Bogott wrote:
> Good news:
>
> I've now finished rebooting all the labvirt hosts, and the last few 
> VMs should be in the process of spinning back up right now.
>
> Bad news:
>
> - We have a lot more (non-VM-hosting) servers to restart!  Expect 
> brief outages in storage, horizon, wikitech, etc during the next few 
> hours.
>
> - Most VMs are still running insecure kernels.  I didn't upgrade them 
> yet because it's not entirely clear that the current generation of 
> security patches won't break random Java apps here and there.  Once 
> there's a clear resolution to that issue there will be another round 
> of reboots, which I will announce in advance.
>
> - The restart process hit k8s nodes a bit harder than planned, so if 
> your tools are running on Kubernetes you might want to verify that 
> things are up and running properly.  If they aren't, a simple restart 
> should resolve things.
>
>
>
>
> On 6/21/17 8:56 AM, Andrew Bogott wrote:
>> Reminder:  These reboots will start in about 5 minutes.
>>
>> On 6/20/17 11:38 AM, Andrew Bogott wrote:
>>> Good morning!
>>>
>>> In order to plug a newly-revealed security hole, we need to upgrade 
>>> kernels everywhere and reboot everything.  I'll be doing this for 
>>> all instances and VM hosts tomorrow, 2017-06-21, beginning at 
>>> 14:00UTC (aka 7AM in San Francisco).
>>>
>>> Project admins should expect instances to be shut down for 10-15 
>>> minutes and then restarted at some point tomorrow.  I don't expect 
>>> the process to take absolutely all day, but it might.  If you have 
>>> an instance that is a special case and you need it to be restarted 
>>> during a more specific window, please respond to me directly with 
>>> details about what you need and we'll try to make it happen.
>>>
>>> These reboots should have minimal impact on things running in Tools, 
>>> although it's always good to keep an eye out -- some tools are 
>>> unable to recover from a shift between nodes and need a manual restart.
>>>
>>> Sorry for the inconvenience!
>>>
>>> -Andrew
>>>
>>>
>>
>




More information about the Labs-announce mailing list