Jeronim and I spent about three hours doing some load testing on yongle
after taking it out of rotation. The idea was to see whether Apache
tweaking could effect any kind of a performance increase. Short summary:
it couldn't.
100 requests randomly sent to a sample of URLs collected from 'en' main
page produced the following numbers:
Elapsed time: 7.50 secs
Data transferred: 3137074 bytes
Response time: 0.64 secs
Transaction rate: 13.33 trans/sec
Throughput: 418276.53 bytes/sec
Concurrency: 8.51
Slow. Next step was to remove all of the Special:* and Category:* pages
from the list, since they're database-intensive. The load test now
completed in 6.21 seconds, with 16.1 trans/s. Marginal increase, but
nothing significant.
Two particular pages were then picked: [[Vietnam War]] (large) and
[[Pyrite]] (small). Running with 50 concurrent clients, 50 repetitions,
Pyrite rendered with about 21.6 trans/sec and Vietnam War with about
18.9 trans/sec. A static image file (from upload) was then tried,
yielding 1785 trans/s.
Two different sets of Apache settings for Max/Min SpareServers,
StartServers, MaxClients and MaxRequestsPerChild were tried, but the
resulting performance numbers didn't seem different in any way that
would hold up to statistical significance. Bottom line: Apache doesn't
even get close to being a bottleneck.
A survey of the apache machines showed about 5-19 requests/sec being
processed on them. The boxes had little free ram lying around, but
generally weren't swapping. However, if these numbers are any good
indication, here's some inference, which should be taken as educated
guessing only:
- PHP/Mediawiki are slow. A good start for anyone willing to tackle this
will be to look at
http://meta.wikipedia.org/wiki/Profiling/Live_aggregate_20040606
- The current Wikipedia setup won't scale too much, but things will
improve once the servers we've been having problems with become
operational. From there on, we'll likely be able to deal with growth by
just acquiring new apaches and putting them into the farm.
- Static files are by no means a bottleneck, and should not be a
priority for distribution in the near future.
Cheers,
Ivan.