[Labs-admin] Ops meeting followup
Andrew Bogott
abogott at wikimedia.org
Mon Jan 23 18:03:41 UTC 2017
Etherpad: https://etherpad.wikimedia.org/p/TechOps-2017-01-23
We chatted a bit about the labstore1004 outage. This may not be news to
anyone else, but Faidon adjusted the network config of 1004 and 1005
shortly after the outage; each server was bouncing back and forth
between two different IPs. Multiple people agreed that this is a likely
cause of the skyrocketing load issue that we saw before the failover.
Recruiting is in progress for a new general Ops team member.
Those are the only things that stand out!
-A
More information about the Labs-admin
mailing list