[Labs-l] Yet another partial labs outage

Andrew Bogott abogott at wikimedia.org
Sat May 16 04:31:28 UTC 2015


The hardware curse continues!

One of the labs virt hosts (labvirt1003) is running very hot tonight, 
presumably due to a broken fan.  It is intermittently scaling the CPU 
speed way back to avoid melting; when that happens there are bound to be 
lots of side-effects like unresponsive instances, clock drift, and the 
like (not least of which is that right now I can't ssh into the damn 
thing, or get performance metrics.)

Naturally this started happening late on a Friday, so it may be a while 
before I can get someone in the datacenter.  I'm leaving the host up in 
the meantime, based on the notion that half a server is better than 
none, but poor performance is likely to be the norm in the meantime.

I did shut off one instance:  wikidata-wdq-mm.  I don't have a personal 
grudge, but it was gobbling CPU cycles and the system really needs a 
rest.  If loss of that instance is a disaster for anyone, contact me and 
I'll see if I can revive it and shut off ten or so other instances to 
make room.

Updates as events warrant!

-Andrew



More information about the Labs-l mailing list