[Labs-announce] Mild but long-running Tools outage in process

Andrew Bogott abogott at wikimedia.org
Thu Jun 29 20:27:40 UTC 2017


     The tools cluster is suffering from several maladies right now. 
Existing services seem to be mostly fine, but any kubernetes services 
that tried to restart in the last few hours probably failed to start, 
and new things are still failing to start.  Similarly, web services and 
other tools are failing to restart in several cases.

     There are various theories as to what's going on -- most likely 
it's a kernel-version incompatibility with the newly upgraded NFS 
server.  There was an earlier ldap outage which is better understood and 
should be resolved by now.

     We apologize for the inconvenience, and are working frantically to 
restore stability.  There will be a follow-up email when things are 
resolved.

-Andrew





More information about the Labs-announce mailing list