[Labs-admin] Ops meeting followup

Andrew Bogott abogott at wikimedia.org
Mon Jan 23 18:03:41 UTC 2017


Etherpad:  https://etherpad.wikimedia.org/p/TechOps-2017-01-23

We chatted a bit about the labstore1004 outage.  This may not be news to 
anyone else, but Faidon adjusted the network config of 1004 and 1005 
shortly after the outage; each server was bouncing back and forth 
between two different IPs.  Multiple people agreed that this is a likely 
cause of the skyrocketing load issue that we saw before the failover.

Recruiting is in progress for a new general Ops team member.

Those are the only things that stand out!

-A





More information about the Labs-admin mailing list