Re: [Wikitech-l] Load balancing

2 Nov 2003


      On Sat, Nov 01, 2003 at 05:13:43PM +0100, Lars Aronsson wrote:
...
Jens Frank wrote:
...
I took the number "90%" as an example. You can make it 99% if you like,
No, I don't like to "make it" any number at all.  I prefer to base
my opinions on observations from real, running systems.  In my
experience, load balancers (just like any Internet router) have near
0% downtime, at most one tenth of the web server applications
involved, especially since the latter tend to be in constant
development.
Yes, I agree, I never saw one of our Cisco load balancers go down
due to hardware failure. What James proposed was using a Linux box. In
contrast to the Cisco, this one will not have redundant power supply
and will have moving parts: fans and disk drives. And disks do fail.
The next point is detecting the failure of a node in the cluster.
Apache port responding/not responding is easy to detect. But is
the information received correct? Is the server in sync? Is the
server able to connect the database? Is the image directory up
to date? This is the area where I saw load balancers fail.
...
If you can report differing experience, I would listen
to your arguments.  But if all you can produce is various guesses in
the 90-99 % range, this becomes pointless.  Did you ever buy a Cisco
router that had 99% availability?
Yes. But Cisco gave us a new one to replace it. And we're not talking
about a Cisco load balancer, those are pretty expensive toys.
...
...
I just want to point out that availability will not increase.
I understand that this is what you want, but I still think you are
wrong.
Just to tease everybody, here is the corresponding table for
http://susning.nu/Sverige (a 44 kbyte page):
Week       Beginning     Downtime    Slowness   Avg access time

2003-w44   27 Oct 2003     1 %          1 %        0.66 seconds
  2003-w43   20 Oct 2003     1 %          0 %        0.31
  2003-w42   13 Oct 2003     1 %          2 %        0.73
  2003-w41    6 Oct 2003     2 %          2 %        0.66
  2003-w40   29 Sep 2003     0 %          1 %        0.32
  2003-w39   22 Sep 2003     0 %          0 %        0.25
  2003-w38   15 Sep 2003     0 %         15 %        2.24
  2003-w37    8 Sep 2003     0 %         84 %        9.72 (oops!)
  2003-w36    1 Sep 2003     0 %          2 %        0.50
  2003-w35   25 Aug 2003     0 %          3 %        1.19
  2003-w34   18 Aug 2003     0 %          2 %        0.60
  2003-w33   11 Aug 2003     0 %          4 %        0.75
  2003-w32    4 Aug 2003     0 %          1 %        0.26
  2003-w31   28 Jul 2003     0 %          0 %        0.63
  2003-w30   21 Jul 2001     0 %          5 %        0.92
Once again, these are my observations, not neutral facts.  If you have
observations that differ significantly from these, please tell me.
OK, several possible conclusions:
- Susning might have less hits than wikipedia
- Susning's software might be better than the current MediaWiki release
- Susning might be running on a hell of a machine
- ....
The only thing you prove by these figures is that Susning is faster.
And I agree that this has to change. Wikipedia should be as fast
as Susning. Clustering is the way to achieve this. But several
different ways to implement a cluster are available. The classical
ones are 
* a load balancer, either special hardware or a routing software running
  on a normal server, in front of the web servers or
* a cluster software like "heartbeat", "Sun Cluster" or "IBM HACMP"
  running on the web server nodes, taking over services when the other
  cluster partner dies.
Regards,
JeLuF

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Load balancing