On Thu, Jul 11, 2002 at 05:21:40PM +0100, Neil Harris wrote:
One interesting observation is that the even when the
English-language
Wikipedia is jammed up,
the international ones, which I believe run on the same server, are
often working OK. This suggests a software, not a hardware, problem.
I've seen both English-down-others-working, and all-down.
Database performance is likely to be contstrained by
two things:
* locking
* disk I/O
Locking is a problem because it serializes accesses, reducing
opportunities for parallel processing, and creating bottlenecks on the
locked resources.
Locking can be made better by:
* locking for as short a time as possible
* locking with the finest grain possible
* using a database which supports concurrent transactions with reduced
locking
What about switching to Postgres ?
It is said to have better locking.
Disk I/O can be made faster by
* using disks which spin fast (rotational latency is reduced), and
* putting them in a big RAID with lots of spindles and a high-speed
attachment
* using an operating system which multi-threads I/O properly
Wikipedia script performance is unlikely to be the bottleneck. We now
have the opportunity to
load the test system heavily and measure CPU load, to be able to
estimate this factor accurately.
Even if is far from consuming 100% CPU, if it's slow, it occupies memory
for longer time. Or it may simply be using too much memory per thread.
Something else could be:
* Memory hogging
This is a little-known nasty factor in server programming. Here, the
problem is worker threads being tied up by slow or malfunctioning
clients, such as those on modems, or with high packet loss, or both.
Say a worker thread consumes W Mbytes of store, and an access transfers
50k bytes (400 kbits) of data.
Then a really slow link at say 20 kbps will take 20 seconds to download
this page. In doing so, it locks W Mbytes in store for that entire time.
If we have X megabytes of store, and slow clients are the dominating
factor, then we can only accomodate X/W concurrent workers, serving
(1/20)*X/W pages per second.
For X = 256, W=2, that's 6.4 hits per second. Therefore, a server needs
to have lots of RAM to prevent slow clients from blocking it. Hmm...
increasing the OS socket buffer size to > 50k might be a win here.
Fortunately, the new server has lots of RAM.
2 megabytes of non-shared memory per thread ?
That would be enormous.
What's the real value like ?
Also if the thread is up it may be unnecessarily holding database
connection. But it's not likely to be major problem.
* Swapping
Once you are doing VM swapping on a webserver or database, performance
plummets. Memory leaks somewhere could be bloating processes, causing
the server to swap.
Swaping isn't a problem, it's a symptom.
Heavy apache or mysql bloat is unlikely and Wikipedia threads live
too short to have chance of bloating too much.