On 18/10/12 09:25, David Gerard wrote:
Whenever an article hits Reddit,[1] the server suffers
under the load.
Typically it goes into swap and thrashes itself to death. If we're
really lucky the oom-killer comes out to play and shoots things
randomly (usually Apache, maybe Lucene). The fun bit: sometimes it
does this for no visible reason, just tips over into swap and promptly
stops talking to the world (my shell session still works slowly).
It's funny how everyone is telling you how to use less CPU when your
problem is actually memory.
I think you should switch everything to FastCGI, and use a single
FastCGI process pool for all wikis. Reduce the maximum number of
FastCGI workers severely, until the PHP memory_limit multiplied by the
maximum worker count is less than the amount of memory you have
available for PHP (i.e. physical RAM minus memcached, lucene, etc.)
The point of this is to decouple Apache's MaxChildren from the maximum
memory usage. It's essential to have a high MaxChildren on an Apache
installation that's directly serving remote users, because Apache will
have a lot of threads just waiting around for communication with the
remote users to complete, even if you disable keepalive.
With FastCGI, you can have a tiny PHP process pool, and in the event
of high load, client connections will politely queue in Apache waiting
for a FastCGI slot, instead of all trying to run PHP at once and
sending your server into swap death.
Once you've done that, you should then disable swap. I am generally
anti-swap -- having swap means that instead of a single process being
killed when the server runs out of memory, the whole server becomes
unresponsive instead, often requiring a power-cycle. But it's
especially bad to use swap on Linode, where I/O can be so slow that
even light swapping can cause the server to be unresponsive.
You can use /proc/[pid]/oom_adj to reduce the chance of oom-killer
killing Lucene or some other useful process. oom-killer is weird and
buggy and sometimes just does its own thing, but you may as well at
least try to teach it some manners. Android uses oom_adj to control
memory usage on phones, it seems to work for them.
An interesting thing about FastCGI is that you can run the workers in
a chroot. If you have 4GB of memory, then I guess you are using a
64-bit Linux distribution. In the worst case, a 64-bit architecture
will have double the memory usage of a 32-bit architecture, due to
pointer sizes. It turns out that some things MW does are not very far
away from that worst case. The schroot "personality" parameter makes
it easy to install a chroot environment for PHP which uses 32-bit
binaries on a 64-bit host.
If you reduce MW's typical memory usage to say 2/3 of its current
value, then you can reduce the memory_limit by the same factor, which
implies that you can increase the FastCGI process pool size by 50% for
a corresponding increase in maximum throughput, assuming CPU is not
maxed out.
-- Tim Starling