Hi all,
I guess you probably know that Wikipedia was down earlier, due to a power fault at the colo centre (apparently). I was chatting to brion on IRC and he recommended that I contact this list.
I think it should be possible to make Wikipedia fully redundant to outages of individual data centres, and not too expensive. Here's how.
Get a BGP portable IP address range. Advertise this range from TWO locations, at separate data centres. Have basically identical read-only servers on each range, with the same IP addresses. Don't worry about IP conflicts, as the servers are identical, and the shortest route from any given client will point to just one data centre, and not move unless that data centre goes down, when it will automatically fall back to the other.
Under normal conditions, your load is shared between both data centres, so you don't need to actually increase the number of servers. If one goes down, all requests go to the other, so performance might drop, but Wikipedia should stay up.
This only works for read-only servers, so the process of editing Wikipedia would still rely on one of the groups (or some subset of servers in that group) being masters, and all the other servers being slaves that sync off those masters.
It's just a suggestion, I'd be interested to hear what you think.
If you are interested, I know a hosting company that has a BGP-portable range (I used to work for them), and I could talk to them about whether they can set up redundant IP tunnelling for that range to whatever IP addresses (VPN endpoints) you want, so you wouldn't even need to have your own BGP range.
Cheers, Chris.