On 5/8/05, Tim Starling <t.starling(a)physics.unimelb.edu.au> wrote:
Ray Saintonge wrote:
...
> I would suggest suggest that any new major backup
colo in North America
> should be in a different power grid area. This would protect from the
> possible consequences of a major blackout such as occurred in 2003. As
> I understand the situation North America has two major power grids, one
> in the East and one in the West. Only Texas and Quebec have independent
> grids.
While both Texas and California are very well connected, this solves a
non-problem. The entire east coast power grid did not fail in 2003,
and never has. If it ever did, we'd have bigger problems on our hands
than not being able to edit Wikipedia articles.
I was thinking more along the lines of battery backup
to tide us over
until the generator comes online, and a redundant network uplink to
guard against switch failure. As I tried to explain, a second US colo
would lead to reduced performance, unless it was within 100km of the
current colo.
Any commercial generator solution includes battery support for the
transition. Generators are worthless if you lose power for five
minutes while the motors warm up.
Redundant network uplinks are a must, and along with them a proper
load balancing solution. This isn't difficult, as we both know. I
concur with the prediction of reduced performance for a second
American colo, *given the current scheme*.
Putting half your hardware on one grid and half on
another means you
lose half your capacity if the power goes off. There is no need for
this. There is already a diesel generator on site, we just need to cover
various kinds of short-term failure. We've seen two short-term failures:
a main circuit breaker trip and a power strip circuit breaker trip. The
power strip failure could have been prevented by having a proper PDU
with independent breakers, I believe one is now on order. Various
threats to the main power, including the main circuit breaker, can be
dealt with by supplying the DB servers with a UPS, and negotiating with
the colo to ensure that their supply is fully redundant. The main power
failure apparently only lasted for a matter of seconds, if they don't
intend to guard against such short failures, then we need to make our
own arrangements.
Agreed on all counts.
-- Tim Starling
--
main(){char*s="2)$%\3404(%\3407!,253\312",i=0;
while(i["]&&down==!up&&s["]&&putchar('@'+(i++)
[s]));return!i;} //
http://www.austinhair.org/