Hello all. Here's a snippet of an AIM conversation I had with Jimbo
about my plan for today:
jasonbomis: I'll be bringing Pliny and Ursula down at the same time.
Ursula will come up shortly after with an extra drive (sdb from
Pliny), and Pliny will come up later with a new primary drive, and
new OS. Somebody will have to move files from SDA on Pliny before I
do this, because SDA is untrustworthy (the only thing in Pliny that
hasn't been replaced, and he still has errors) and won't be returning
to service this trip...
jasonbomis: Gunther will go down after Ursula comes up, and will come
back up shortly thereafter with more RAM. He has 2 512Meg modules
now, right? So, I'll add 3 (1 Gig each) modules as per the manual.
I'll be driving to the servers fairly soon, and it will take me 3
hours to get there. Then, I'll be doing what I said above. If
anything goes wrong, I'll improvise and leave the servers in no worse
situation than they are in now. At the very least, Gunther will have
more RAM.
I'm not touching Geoffrin this trip (at least that's not part of the
plan).
Call me with problems/concerns
cell 1 760-963-0681
cell 2 760-486-9194 (this one's been acting funny, so...)
--
"Jason C. Richey" <jasonr(a)bomis.com>
I've made arrangements with the European Graduate School,
http://www.egs.edu/ for them to host a squid proxy box for European
visitors. I assume we're not 100% ready to move to that solution, but
we're close, right?
Eventually we want to do some fancy DNS magic, but we don't need to
wait for that to get this off the ground. We can just point
de.wikipedia.org and/or fr.wikipedia.org and/or es.wikipedia.org,
etc. at the European machine, and that will help those groups
immediately and transparently, am I right?
I don't have an exact date for when they will be ready, but we will
have root access to the machine and will administer it ourselves, so
we don't have to depend on them (nor do we get to depend on them,
except for hardware failures I guess).
--Jimbo
We have 8 1U systems coming, and 1 2U system. The 2U is shipping
today, and the 1Us will ship today or tomorrow!
The shipping time is roughly 5 days. So quite possibly next weekend
I'll be camped out in the colo setting things up.
Thanks for all your hard work, Brion. If I were on the Wikimedia board
of trustees, I would vote you a $1,000 stipend to show you the world's
gratitude.
Ed Poor
Since geoffrin's still exhibiting intermittent MySQL crashes, I've
moved the database to gunther. Thanks to replication, it only took a
few minutes to switch main servers. Rock on!
The downside is that gunther presently has only 1GB of memory so it's
got less in-memory cache available. In view of this I've turned off
searching for the moment to make sure we can run ok. If it seems ok,
we'll try it back on.
The upside is that gunther has a fast SCSI RAID disk array like
geoffrin does, which should make it a lot more palatable than ursula.
Ursula is replicating updates now from gunther.
-- brion vibber (brion @ pobox.com)
We started getting some weird errors from the database for a few
minutes; "lost connection from server" and "syntax errors" on
statements that looked fine.
Checking the mysql and system error logs on geoffrin I found that
mysqld threads were getting killed off due to out-of-memory conditions.
I've lowered the key and innodb buffer sizes (from 384M each to 256M)
and restarted it, hopefully this will be fine.
Also I noticed that the clock on geoffrin has drifted; it's 8 hours and
a little bit off. This may be due to the hardware clock not getting
updated properly after the OS being set to UTC. I haven't recorrected
it yet because I'm not sure if it'll affect anything in the database to
set the time _backwards_ several hours. Timestamps in the db all come
from the webservers, so this shouldn't affect anything on the wiki to
leave it for a bit.
-- brion vibber (brion @ pobox.com)
Geoffrin is now serving as the main database server (with memory down
to 2 gigs I wasn't able to detect any errors in memtester). Ursula is
now a replicated slave; if Geoffrin explodes in theory we should be
able to switch back over to Ursula.
I also put the initial database onto gunther, but haven't finished
setting up mysql there. Will make that a replicant too I suspect. Only
1GB of memory, but it's got the big fast raid configuration that is
probably a better tradeoff if we have to use it.
At some point in the next day or so I'd like to get one or another of
the faster machines to take over en.wikipedia.org and get rid of en2.
Also, if things don't fall down we can start taking down the extreme
anti-slowness measures again.
-- brion vibber (brion @ pobox.com)