A few things we've learned over the years through trial and error that may
be helpful to you.
I think your biggest improvement is probably getting a Squid/Varnish cache
setup (as recently discussed here). Depending on your traffic (logged in vs
anonymous viewers) you could see as much as a x6 drop in Apache requests.
It also greatly helps with the Slashdot/Reddit effect as most of the views
in such a traffic spike will be anonymous viewers which would be handled
entirely by Squid/Varnish. It is possible to squeeze Squid into the same
server but you have to juggle ports and memory.
You may also be able to reduce the Apache load significantly by moving all
static resources (images, CSS, JS) to be served by a lighter weight server
like lighttpd or nginx. I guess recently some CSS/JS was changed from
static files to the ResourceLoader but there is still plenty of static
content to consider it (our lighttpd server averages x10 more connections
than our Apache server). Like Squid you can squeeze this into the same
server by juggling ports and will probably actually save memory (a
lighttpd/nginx connection uses far, far less memory than Apache would).
The net result of these two changes is to try and minimize Apache
connections, which are relatively CPU/memory expensive, for only what is
needed. As a point of comparison we're currently averaging 170 req/sec over
all servers (Squid + lighttpd + Apache) but Apache is only handling 10
req/sec. We simply couldn't handle the traffic if we tried to get Apache to
handle all of it.
One quick setting you can look into is to set $wgCacheDirectory to a local
directory. We had huge problems recently upgrading from 1.14 to 1.19 until
we set this parameter. It may have just been related to our upgrade but
anything that reduces database usage, especially on a single server, is
usually a good thing.
The only other significant advice I can think of is to make sure
you regularly benchmark/load test as you try to optimize things. There are
a bunch of good reasons to do this:
1) Documentation
2) Verification : i.e., did we actually make things faster and if so by
how much
3) Load Capacity Planning: how many connections/users can we support
and how close are we right now to the limit
I usually just use ApacheBench (ab) which gives good enough results but
there are plenty of others and you can get as complex testing as you want.
As for hardware, we use iWeb which has reasonable prices although its
support it hit or miss (its either great or terrible it seems). We have a
variety of dedicated servers in the $100/month range having gone for a
horizontal scaling design (more smaller boxes with some redundancies among
them). I don't know what resources (money) you have available and what sort
of request rate you you're getting but with some optimization and 1 or 2
boxes I would think you should be able to survive a Slashdot/Reddit effect
without completely going under.
Beyond this your next step is either scaling up (faster computer) or out
(more servers). For scaling out just look at your process usage (top) and
see what is using the most CPU/memory and then split that out to the new
server. Chances are a good first step would be getting a dedicated database
server or another Apache server.
On 17 October 2012 18:25, David Gerard <dgerard(a)gmail.com> wrote:
The problem: In general,
rationalwiki.org is melting
under the strain.
We can't quite afford the next step up right this moment, though want
to plan for it, and need to keep things from intermittently blowing up
as they are.
--
Dave Humphrey -- dave(a)uesp.net
Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net