Geoffrin is physically installed and presumably will go into service when the developers (Brion, esp.) get the time. Brion had mentioned possibly this weekend, but of course *I* think he should take his time. :-)
So now we have a pretty sweet setup, but let me know: what are our next needs? When should I start thinking about shopping? What will we want to get?
Random guesses are discouraged -- I think the most productive recommendations will be based on specific information of bottlenecks or points-of-failure that have formed or will form soon based on actual empirical evidence.
I know coronelli has a stability problem, and we have money in the bank. Perhaps we could take coronelli out of rotation?
--Jimbo
On Apr 3, 2004, at 14:29, Jimmy Wales wrote:
So now we have a pretty sweet setup, but let me know: what are our next needs? When should I start thinking about shopping? What will we want to get?
Random guesses are discouraged -- I think the most productive recommendations will be based on specific information of bottlenecks or points-of-failure that have formed or will form soon based on actual empirical evidence.
Distribution of disk space is a bit of a sore point at present. The apache boxes don't really need much, as most of the space-sucking data is coming off the wire over the database or NFS, however the space requirements for the database and a couple generations of backups are pretty pushy on zwinger and suda. If we could get expanded disk capacity on those two, that would be a help. (Geoffrin IIRC should have a goodly amount of space itself.)
As far as points of failure; zwinger has been pretty reliable, but it's a big single point of failure: if NFS goes down, the apache's can't do squat. We currently keep some of the common files on the local disks for performance's sake and rsync them when things are updated, but image uploads, thumbnails, and math rasterizations don't fit that model well, having the requirement for files to be available from a randomly-selected mirror less than a second after being created. We might consider adding an explicit mirroring setup into the code, or...?
I know coronelli has a stability problem, and we have money in the bank. Perhaps we could take coronelli out of rotation?
Coronelli and browne both crash occasionally, though coronelli seems to go a little more often. They're also both running experimental kernel packages, which may reduce their stability (though greatly improving performance compared with the RH9 stock 2.4.x kernels).
It probably wouldn't hurt to take coronelli down for another round of stress testing, though.
-- brion vibber (brion @pobox.com)
wikitech-l@lists.wikimedia.org