Gabriel Wicke wrote:
On Tue, 06 Jan 2004 15:09:01 -0800, Jimmy Wales wrote:
Gabriel Wicke wrote:
|----------- Bind DNS round robin
_______|______ (multiple A records)
How well does DNS round robin work in practice as compared to a proper
load balancer?
I think it's the solution with the least control on how the traffic will
be actually distributed. On the other hand the squids might not be
the bottleneck anytime soon, and by that time it would propably make sense
to install a second, possibly distributed squid layer. It might as well
save some time because the work to install Heartbeat would only need to be
done once on the Squids. Once a second Squid layer is installed any
separate Linux Directors are not needed anymore (and the Squids
will need Heartbeat then). Except maybe for balancing the database, but
that's propably a different setup altogether.
I've never tried DNS round robin because
people say it sucks. But I
have no actual knowledge from first hand experience.
Me neither.
I do know that
load balancing using the tools that
linuxvirtualserver.org talks about
works great, and gives a very good and predictable level of control.
I've asked the squid guys about the load balancing setup: If ICP is
disabled (Apache doesn't support it) it would do a dumb round-robin
without weighting. With ICP enabled (as a dumb echo on port7) it
will take the weighting into account (i'm waiting for the answer on how
this works out if the ISP roundtrips are all very quick in case of port7).
The best solution would be an ICP daemon that delays ICP responses
depending on the system load, then the system load would be 100% the same
on all Apaches. If the Apaches are all similar in performance this is an
academical question anyway. Linux Director also does it as round-robin
with static weighting unless feedbackd is installed.
Gabriel Wicke
Here's another way of doing it without the load balancers: have a pool
of say 36 IP addresses (chosen because lots of numbers exactly or
approximately divide into it: other numbers would do fine), and do
simple DNS round-robin to balance user traffic across these. Browsers
and DNS caches will cache DNS lookups, but we don't care, so long as
traffic is distributed roughly evenly across the IP addresses.
Now, have N << 36 Squid servers, and allocate the IP addresses
_dynamically_ to the servers. If they are all on the same Ethernet, this
can be done using ARP.
Very roughly:
First set every machine to serve requests on all IP addresses on each
Ethernet card in the shared pool. Requests will be seen so long as that
logical interface is configured "up" on that machine.
Now start a polling process:
* every machine checks whether there is a live machine at each IP
address in the pool at some reasonable pseudo-randomized interval, by
doing an ARP request
* if it sees that another machine as well as itself is serving an IP
address from the pool, it will take down that logical interface, to
prevent allocation collisions
* when it sees there isn't, it looks at the number of addresses assigned
to each machine (which it knows from its ARP scans)
* am I the machine with the lowest MAC address from the set of
least-assigned machines? --> take over the IP by calling 'ifconfig' for
that interface
The interval is pseudo-random to break symmetry to prevent
synchronization by entrainment and thus make races very unlikely. (Note
that races are self-fixing, see below).
Also, excessively heavily loaded machines could also _take down_ logical
interfaces, to allow other machines to take up the load. This would also
be done on the basis of exceeding a "fair share": a damping mechanism
would be needed to stop excess re-balancing and to aid convergence.
(Margin too small to contain details of formula for stable damping
thresholds and timeouts).
Upside: simple and distributed. Race conditions will be very rare
(p<0.001), and rapidly corrected. If we poll each IP address once a
second from each machine, a new server will be allocated to a fallen IP
address within substantially less than a minute (it gets less as N gets
bigger). Overhead on each machine: user-space doing an ARP probe once a
second, the kernel listening to N ARP requests per second.
Downside: when an IP address is transferred from one machine to another,
any HTTP transfer in progress will time out. However, as this should
only occur when nodes fail, or when new or rebooted nodes are brought up
within the cluster, this is probably acceptable.
Doing things this way could possibly eliminate the need for the
load-balancing boxes completely, and the scripts are no more complex
than the heartbeat script.
Possible problem: partial failure of a system, so it responds to network
requests but not application-level requests
Two possible resolutions:
* watchdog-like process which performs local tests, and in the event of
failure, shuts down all logical interfaces in the pool on that machine,
allowing them to be picked up by other machines
* remote monitoring, and remote operator intervention via SSH or the
power strip, if the above fails (automated STONITH does not make sense...)
The code can written entirely in Python, and can simply use standard
UNIX commands to do low-level stuff.
-- Neil