Re: [Wikitech-l] Re: squid idea summary

7 Jan 2004

Gabriel Wicke wrote:

...
 On Tue, 06 Jan 2004 15:09:01 -0800, Jimmy Wales wrote:

 Gabriel Wicke wrote:

               |-----------  Bind DNS round robin
       _______|______       (multiple A records)

 How well does DNS round robin work in practice as compared to a proper
load balancer?

I think it's the solution with the least control on how the traffic will
be actually distributed. On the other hand the squids might not be
the bottleneck anytime soon, and by that time it would propably make sense
to install a second, possibly distributed squid layer. It might as well
save some time because the work to install Heartbeat would only need to be
done once on the Squids. Once a second Squid layer is installed any
separate Linux Directors are not needed anymore (and the Squids
will need Heartbeat then). Except maybe for balancing the database, but
that's propably a different setup altogether.

 I've never tried DNS round robin because
people say it sucks.  But I
have no actual knowledge from first hand experience. 

Me neither.

 I do know that
load balancing using the tools that linuxvirtualserver.org talks about
works great, and gives a very good and predictable level of control.

I've asked the squid guys about the load balancing setup: If ICP is
disabled (Apache doesn't support it) it would do a dumb round-robin
without weighting. With ICP enabled (as a dumb echo on port7) it
will take the weighting into account (i'm waiting for the answer on how
this works out if the ISP roundtrips are all very quick in case of port7).
The best solution would be an ICP daemon that delays ICP responses
depending on the system load, then the system load would be 100% the same
on all Apaches. If the Apaches are all similar in performance this is an
academical question anyway. Linux Director also does it as round-robin
with static weighting unless feedbackd is installed.

Gabriel Wicke

 Here's another way of doing it without the load balancers: have a pool 
of say 36 IP addresses (chosen because lots of numbers exactly or 
approximately divide into it: other numbers would do fine), and do 
simple DNS round-robin to balance user traffic across these. Browsers 
and DNS caches will cache DNS lookups, but we don't care, so long as 
traffic is distributed roughly evenly across the IP addresses.

Now, have N << 36 Squid servers, and allocate the IP addresses 
_dynamically_ to the servers. If they are all on the same Ethernet, this 
can be done using ARP.

Very roughly:
First set every machine to serve requests on all IP addresses on each 
Ethernet card in the shared pool. Requests will be seen so long as that 
logical interface is configured "up" on that machine.

Now start a polling process:
* every machine checks whether there is a live machine at each IP 
address in the pool at some reasonable pseudo-randomized interval, by 
doing an ARP request
* if it sees that another machine as well as itself is serving an IP 
address from the pool, it will take down that logical interface, to 
prevent allocation collisions
* when it sees there isn't, it looks at the number of addresses assigned 
to each machine (which it knows from its ARP scans)
* am I the machine with the lowest MAC address from the set of 
least-assigned machines? --> take over the IP by calling 'ifconfig' for 
that interface

The interval is pseudo-random to break symmetry to prevent 
synchronization by entrainment and thus make races very unlikely. (Note 
that races are self-fixing, see below).

Also, excessively heavily loaded machines could also _take down_ logical 
interfaces, to allow other machines to take up the load. This would also 
be done on the basis of exceeding a "fair share": a damping mechanism 
would be needed to stop excess re-balancing and to aid convergence. 
(Margin too small to contain details of formula for stable damping 
thresholds and timeouts).

Upside: simple and distributed. Race conditions will be very rare 
(p<0.001), and rapidly corrected. If we poll each IP address once a 
second from each machine, a new server will be allocated to a fallen IP 
address within substantially less than a minute (it gets less as N gets 
bigger). Overhead on each machine: user-space doing an ARP probe once a 
second, the kernel listening to N ARP requests per second.

Downside: when an IP address is transferred from one machine to another, 
any HTTP transfer in progress will time out. However, as this should 
only occur when nodes fail, or when new or rebooted nodes are brought up 
within the cluster, this is probably acceptable.

Doing things this way could possibly eliminate the need for the 
load-balancing boxes completely, and the scripts are no more complex 
than the heartbeat script.

Possible problem: partial failure of a system, so it responds to network 
requests but not application-level requests
Two possible resolutions:
* watchdog-like process which performs local tests, and in the event of 
failure, shuts down all logical interfaces in the pool on that machine, 
allowing them to be picked up by other machines
* remote monitoring, and remote operator intervention via SSH or the 
power strip, if the above fails (automated STONITH does not make sense...)

The code can written entirely in Python, and can simply use standard 
UNIX commands to do low-level stuff.

-- Neil

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: squid idea summary