On 4/9/06, Chris Wilson chris@qwirx.com wrote:
Hi all,
I guess you probably know that Wikipedia was down earlier, due to a power fault at the colo centre (apparently). I was chatting to brion on IRC and he recommended that I contact this list.
I think it should be possible to make Wikipedia fully redundant to outages of individual data centres, and not too expensive. Here's how.
Get a BGP portable IP address range.
Yea, one of dem portable /25s.
Advertise this range from TWO locations, at separate data centres. Have basically identical read-only servers on each range, with the same IP addresses. Don't worry about IP conflicts, as the servers are identical, and the shortest route from any given client will point to just one data centre, and not move unless that data centre goes down, when it will automatically fall back to the other.
Er narf. No. Internet routing is not that stable.
If you anycast TCP on the public internet you *will* end up with oddball behavior as routing topology changes where users get hung connections because the route changed out from under them. Getting such a thing working correctly is quite a big more complex then you seem to think it is.
Under normal conditions, your load is shared between both data centres, so you don't need to actually increase the number of servers. If one goes down, all requests go to the other, so performance might drop, but Wikipedia should stay up.
This only works for read-only servers, so the process of editing Wikipedia would still rely on one of the groups (or some subset of servers in that group) being masters, and all the other servers being slaves that sync off those masters.
Last I checked we still had issues getting mysql replication working well across non-local networks.
It's just a suggestion, I'd be interested to hear what you think.
If you are interested, I know a hosting company that has a BGP-portable range (I used to work for them), and I could talk to them about whether they can set up redundant IP tunnelling for that range to whatever IP addresses (VPN endpoints) you want, so you wouldn't even need to have your own BGP range.
For what you propose the portable block would have to contain all the normal Wikipedia traffic. I some how suspect that they would rather not be tunneling several hundred mbit/sec of traffic. :)