[Foundation-l] Cluster report, September-November, 2005
Domas Mituzas
midom.lists at gmail.com
Sun Nov 27 12:58:21 UTC 2005
Hello, just a shameless copy-paste from meta (http://
meta.wikimedia.org/wiki/Cluster_report%2C_September-November%2C_2005)
These months were yet again amazing in Wikimedia growth history.
Since September request rates doubled, lots of information added,
modified and expanded, more users came.
To deal with that site had to improve both software and hardware
platforms again.
Of course, more hardware was thrown at the problem.
In mid-September three new database servers (thistle,ixia,lomaria)
were added to the pool, removing ancient type of hardware from the
service.
With data growth rates 'old' 4GB-RAM boxes could not keep up with
operation, except quite limited one.
40 dual-opteron application servers have been deployed, conserving
our limited colocation space, as well as providing lots of
performance for a buck.
One batch of them (20) was deployed just this week.
They're equipped with larger drives and more memory, thus allowing to
place various unplanned services on them (9 apache servers are
storing old revisions as well), some servers participate in shared
memory pool, running memcached.
One of really efficient purchases was 12k$ worth image server
'amane', providing us with storage space and even ability to to
backup at current loads.
It is running now highly efficient and lightweight HTTP server -
lighttpd.
So far images are served, but growth of Wikimedia Commons will force
us to find a really scalable and reliable way to handle lots of media.
Additionally 10 more application servers are ordered together with a
new Squid cache server batch.
These 10 single-opteron boxes will have 4 small and fast disks and
should enable efficient caching of content.
As all this gear was bought for donated money, we really appreciate
community help here, thank you!
Yahoo supplied cluster in Seoul, Korea has finally got into action,
bringing cached content closer to Asian locations, as well as having
master databases and application cluster for Japanese, Thai, Korean
and Malaysian Wikipedias.
For internal load balancing Perlbal was replaced by LVS, and we've
got a nice flashy donated load balancing device that may be deployed
into operation soon as well.
LVS has to be handled with care and several tiny misconfiguration
incidents seriously affected site performance.
Lately the cluster has became quite big and complex and now we need
more sophisticated and extensive sanity checks and test cases.
There are lots of work in establishing more failover capabilities -
we will be having two active links to our main ISP in Florida.
Static HTML dump is (becoming) nice and usable and may help us in
case of serious crashes. It can be served from Amsterdam cluster as
well!
As for last several days we managed to bring cluster into quite
proper working shape, now it's important to fix everything and
prepare for more load and more growth and yet another expansion.
We hope that we will be able with the help of community to solve all
our performance and stability issues and avoid being Lohipedia :)
Lots of various problems were solved so far in order to achieve what
we have now, and lots of low hanging fruits have been picked.
What is dealt now with is complex and needs manpower and fresh ideas
as well.
Discussions are always welcome on #wikimedia-tech in Freenode (except
during serious downtimes :).
And, of course, Thanks Team (or rather, Family)! It is amazing to
work together!
Cheers,
Domas
More information about the foundation-l
mailing list