Gabriel Wicke wrote:
On Tue, 06 Jan 2004 15:09:01 -0800, Jimmy Wales wrote:
Gabriel Wicke wrote:
|----------- Bind DNS round robin _______|______ (multiple A records)
How well does DNS round robin work in practice as compared to a proper load balancer?
I think it's the solution with the least control on how the traffic will be actually distributed. On the other hand the squids might not be the bottleneck anytime soon, and by that time it would propably make sense to install a second, possibly distributed squid layer. It might as well save some time because the work to install Heartbeat would only need to be done once on the Squids. Once a second Squid layer is installed any separate Linux Directors are not needed anymore (and the Squids will need Heartbeat then). Except maybe for balancing the database, but that's propably a different setup altogether.
I've never tried DNS round robin because people say it sucks. But I have no actual knowledge from first hand experience.
Me neither.
I do know that load balancing using the tools that linuxvirtualserver.org talks about works great, and gives a very good and predictable level of control.
I've asked the squid guys about the load balancing setup: If ICP is disabled (Apache doesn't support it) it would do a dumb round-robin without weighting. With ICP enabled (as a dumb echo on port7) it will take the weighting into account (i'm waiting for the answer on how this works out if the ISP roundtrips are all very quick in case of port7). The best solution would be an ICP daemon that delays ICP responses depending on the system load, then the system load would be 100% the same on all Apaches. If the Apaches are all similar in performance this is an academical question anyway. Linux Director also does it as round-robin with static weighting unless feedbackd is installed.
Gabriel Wicke
Here's another way of doing it without the load balancers: have a pool of say 36 IP addresses (chosen because lots of numbers exactly or approximately divide into it: other numbers would do fine), and do simple DNS round-robin to balance user traffic across these. Browsers and DNS caches will cache DNS lookups, but we don't care, so long as traffic is distributed roughly evenly across the IP addresses.
Now, have N << 36 Squid servers, and allocate the IP addresses _dynamically_ to the servers. If they are all on the same Ethernet, this can be done using ARP.
Very roughly: First set every machine to serve requests on all IP addresses on each Ethernet card in the shared pool. Requests will be seen so long as that logical interface is configured "up" on that machine.
Now start a polling process: * every machine checks whether there is a live machine at each IP address in the pool at some reasonable pseudo-randomized interval, by doing an ARP request * if it sees that another machine as well as itself is serving an IP address from the pool, it will take down that logical interface, to prevent allocation collisions * when it sees there isn't, it looks at the number of addresses assigned to each machine (which it knows from its ARP scans) * am I the machine with the lowest MAC address from the set of least-assigned machines? --> take over the IP by calling 'ifconfig' for that interface
The interval is pseudo-random to break symmetry to prevent synchronization by entrainment and thus make races very unlikely. (Note that races are self-fixing, see below).
Also, excessively heavily loaded machines could also _take down_ logical interfaces, to allow other machines to take up the load. This would also be done on the basis of exceeding a "fair share": a damping mechanism would be needed to stop excess re-balancing and to aid convergence. (Margin too small to contain details of formula for stable damping thresholds and timeouts).
Upside: simple and distributed. Race conditions will be very rare (p<0.001), and rapidly corrected. If we poll each IP address once a second from each machine, a new server will be allocated to a fallen IP address within substantially less than a minute (it gets less as N gets bigger). Overhead on each machine: user-space doing an ARP probe once a second, the kernel listening to N ARP requests per second.
Downside: when an IP address is transferred from one machine to another, any HTTP transfer in progress will time out. However, as this should only occur when nodes fail, or when new or rebooted nodes are brought up within the cluster, this is probably acceptable.
Doing things this way could possibly eliminate the need for the load-balancing boxes completely, and the scripts are no more complex than the heartbeat script.
Possible problem: partial failure of a system, so it responds to network requests but not application-level requests Two possible resolutions: * watchdog-like process which performs local tests, and in the event of failure, shuts down all logical interfaces in the pool on that machine, allowing them to be picked up by other machines * remote monitoring, and remote operator intervention via SSH or the power strip, if the above fails (automated STONITH does not make sense...)
The code can written entirely in Python, and can simply use standard UNIX commands to do low-level stuff.
-- Neil