[Foundation-l] Operations report, Nov 2005-Aug 2006
Domas Mituzas
midom.lists at gmail.com
Sat Aug 19 07:54:10 UTC 2006
This hasn't been done for a while, so I'll try to sum up changes in
our operations since November, 2005.
There has been much less insane headless chicken run and we've seen
quite steady operation operation (except few hiccups) lately.
First of all, we could afford for a while ordering hardware before we
were completely overloaded - it was constant tune in previous years.
There were lots of system architecture changes lately too - the way
how we store data, the way how we serve and cache images, and text.
==Hardware==
One of good news is that we can still stay at same class of database
servers, which even are getting much cheaper than before.
Database server cost per unit went from $15000 in Jun, 2005 to $12500
in October, 2005, to $9070 in March, 2006.
We got four of these servers in March and called them... db1, db2,
db3 and db4.
For application environment we did a single $100000 purchase, that
provided us with 40 high performance servers (with two dual core
opteron processors and 4GB of RAM each).
This nearly doubled our CPU capacity, and also provided enough of
space for revision storage, in-memory caching, etc.
For our current caching layer expansion we ordered 20 high
performance servers (8GB memory, four fast disks, $3300 each), which
should appear in production in ~one month.
We're investigating possibilities of adding more hardware in
Amsterdam cluster. We might end up with 10 additional cache servers
there too.
We also purchased $40000-worth of Foundry hardware, based on their
BigIron RX-8 platform.
We will use that as our highly available core routing layer, as well
as connectivity for most demanding servers.
As well, this will allow flexible networking with upstream providers.
Our next purchase will be image hosting/archival systems, and now
there's still ongoing investigation whether to use our previous
approach (big cheap server with lots of big cheap disks), or to
deploy some storage appliance.
We reallocated some aging servers to search cluster and other
auxiliary, and still continue this practice, so that we'd end up with
more homogenous application environment.
==Software==
There were lots of improvements in MediaWiki itself, but additionally
Tim and Mark ended up in Squid authors list - changes made in it's
code were critical to proper squid performance.
We did split database cluster, with English Wikipedia ending up on
separate set of boxes.
Some of old database servers got their new life being slaves just of
few languages, thus compensating lack of memory or fast disk system.
Additionally revision storage was moved from our core database boxes
to 'external storage clusters', which are our application servers
utilizing their idle disks.
In optimization work multiple factors are being worked on.
"Make it faster" means not only having more requests per second
served, but also reducing response times, and both issues are worked
on constantly.
And of course, as always, team has been marvelous ;-) Thanks!
--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
More information about the wikimedia-l
mailing list