I have here 3 things that need to be installed in the colo. I'll be there in about 2 hours, which means around noon Florida time.
1. SCSI RAID Card - MegaRaid SCSI 320-1LP with battery backup (neat!) I'm a bit nervous about this, because the driver installation instructions are very strange and un-linux-like. There's a program called RWFLOPPY.EXE that I might have to use. What is this, 1981?
2. 1U rackmount keyboard/monitor 16 port KVM switch (nice!)
3. APC powerport, donated during our pledge drive, new in box
The installation of the SCSI RAID card will mean some downtime today for suda (the new database server, 2U opteron), while the powerport installation will mean a few minutes of downtime for the others.
The new facility is staffed 24 hours so, in theory anyway, we don't need the powerport, as we can have a human do it. But adding people to the list of authorized contacts is harder than giving people access to the power port, so it's still worthwhile.
The power port has 8 outlets, but we have 9 machines and 10 plugs. We therefore need to prioritize what is on the powerport.
1. suda (database server) 2. suda (database server) 3 ? 4 ? 5 ? 6 ? 7 ? 8 zwenger (mail server, right?)
The prioritization has to do with (a) liklihood of the machine actually going down and (b) how desperately important that machine is to the running of wikipedia.
We're going to have failover protection for the webservers and the squids, so maybe that's it.
Remember, here's the chain of events that has to happen before we experience major regret about not having the right thing on the APC:
1. A machine has to die in such a way that a hard boot is necessary 2. The 24x7 staff at the colo has all vanished 3. I am out of town and can't drive over there (but I guess if there is no staff there, I can't get in anyway) 4. The machine in question is not on the power port.
We could experience minor regret, though, if the machine isn't on the power port and someone has to call a human to reboot it, which takes 5 minutes instead of 5 seconds.
--Jimbo
On Fri, 06 Feb 2004 07:11:19 -0800, Jimmy Wales wrote:
The power port has 8 outlets, but we have 9 machines and 10 plugs. We therefore need to prioritize what is on the powerport.
We're going to have failover protection for the webservers and the squids, so maybe that's it.
I would vote for DB, mail server, Squids and then as many as possible Apaches. there are enough Apaches, if one goes down it makes relatively little difference for the remaining redundancy and performance.
If one Squid is down we have a single point of failure.
Gabriel Wicke wrote:
On Fri, 06 Feb 2004 07:11:19 -0800, Jimmy Wales wrote:
The power port has 8 outlets, but we have 9 machines and 10 plugs. We therefore need to prioritize what is on the powerport.
We're going to have failover protection for the webservers and the squids, so maybe that's it.
I would vote for DB, mail server, Squids and then as many as possible Apaches. there are enough Apaches, if one goes down it makes relatively little difference for the remaining redundancy and performance.
If one Squid is down we have a single point of failure.
O.k., I'll do that. Only 2 machines will not be on the power port, so the bulk of everything is going to be just fine.
--Jimbo
Just to clarify; the machine that'll be down is a machine that we're _not using_ on the live site yet.
So everything should be fine. :)
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org