On Tue, Feb 24, 2009 at 1:07 PM, Robert Ullmann rlullmann@gmail.com wrote:
Really? I mean is this for real?
The sequence ought to be something like: breaker trips, monitor shows within a minute or two that 4 servers are offline, and not scheduled to be. In the next 5 minutes someone looks at the server(s), notes that there is no AC power, walks directly to the panel and resets the breaker. How is this *not* done? I'm sorry, I just don't get it. I've run data centres, and it just is not possible to have servers down for AC power for more than a few minutes unless there is a fault one can't locate. (Or grid down, and running a subset on the generators ;-)
Can someone explain all this? Is the whole thing just completely beyond the resource available to manage it?
Constructive suggestions for improvement are far more welcome than complaints and outrage.
If you have no suggestions for improvement, it is perhaps more prudent to express concern that dumps are not working and to wait for a response. This is admittedly less fun than piecing together information and "lining up" those responsible for something not being operational.