Bravo! Excellent analysis!
Keep in mind -- we have money in the bank, but there's probably about
a 2 week lead time on getting new servers if we include time for us to
puzzle over the exact needs before making another order.
We don't want to prematurely throw hardware (money!) at any problems
that are really software problems, but at the same time, if we feel a
need for more stuff, we can get it.
Jason is working on the migration of Bomis to here, and as that goes
forward, at some point whatever Wikimedia Foundation owns in the San
Diego colo will get transferred here. The older slower machines can
probably be used for little stuff, or if it's the judgment of the
technical staff (that's you guys) that we're better off not using it,
then we can sell it, either to Bomis (but I want to be VERY careful
not to raise any conflict of interest issues... the money should
always flow *from* me *to* wikipedia, not the other way around, or I'm
sure some jerkoff will say something) or on ebay.
--Jimbo
user_Jamesday wrote:
To give some idea of what makes a difference, here are
some of the things discovered over the last two weeks:
1. Storing the PHP files on NFS doubled the page load time, from around 180ms to around
360ms. So, that's no longer being done and effectively no complicated programming was
needed to double performance.
2. Squid using the disk reached cache hit rates of 78% and still rising, compared to 60%
without the disk, but...
3. Squid using synchronous I/O blocked on disk and would sometimes result in timeouts.
So, disk was turned off for the last week.
4. The Apaches started slowing down at peak load times earlier this week, so more Squid
investigation started, using asynchronous disk I/O this time. Given the past disk caching
experience, this has the potential to cut the load the Apaches see by around 25-50% (they
see 40% of load without disk, disk cut that to 22%).
5. There's a parameter in Squid which tells it when to ignore the disk, based on the
number of file descriptors in use. If the limit is exceeded, Squid ignores the disk and
just passes the request directly to the Apaches, or skips saving the page to the disk.
Tuning of this parameter is currently ongoing. When set correctly it should let Squid
deliver all it can from combined disk and RAM, but only up to the point where it
doesn't start to block waiting for the disk.
So, there's no need to get too enthusiastic about tuning the code. The new server
setup still isn't tuned fully yet... and it probably won't be before we get a nice
fast database server, a second Squid and some more Apaches to spread the load around.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l