-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
== Summary ==
Full disk on the database master for
non-en.wikipedia.org made most of
our wikis uneditable for about 2.5 hours, during Europe midday / US morning.
Immediate problem is repaired; some minor further cleanup needed;
procedural changes recommended.
== Disruption and data loss ==
The last edits to make it to the slave servers were at:
2007-01-19 11:50:29 UTC
A few more made it through on samuel before it stopped accepting more
data up to:
2007-01-19 11:51:03
(23 broken edits on de.wikipedia.)
After that point the database didn't accept more writes, leaving a
read-only state which didn't allow any further consistency problems to
develop.
There _may_ be some minor problems related to caching of revision data
where ID numbers overlap from the old server, but this is unclear.
== Inspection and repair ==
I was woken up around 13:50 to take a look, informed that samuel
(non-enwiki master) was out of disk space and wikis were read-only.
After a few minutes to check that the slaves were consistent and that
there wasn't _too_ bad a lag between them and the master, I decided to
go ahead with a master switch to adler, leaving samuel out of service
until it gets re-cloned.
By 14:26 the master switch was done, and read-write service restored.
== Further work: immediate ==
If really desired, we may be able to clone the small number of 'lost'
edits from samuel.
Once we no longer need samuel's data, it should have its database
re-cloned from one of the slaves consistent with the new state, and it
can be restored to slave service.
== Further work: long-term ==
Our procedure for monitoring disk space and cleaning up binlogs is terrible.
Low-disk warnings from Nagios are routinely ignored, in part because the
thresholds seem much too high.
Binlog cleanup appears to be entirely manual and ad-hoc; there is no set
schedule or assignment to do this.
The good news is this task is easy to automate.
Recommendation:
* automate cleanup of binlogs on the db masters.
* make low-disk warnings more reasonable and visible for the masters
specifically (where it really, really matters)
- -- brion vibber (brion @
pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iD8DBQFFsOS1wRnhpk1wk44RAjujAKDLga9UHrs9Z5o0E6DM24puZvkSMwCeO9N0
/TIoWOSKKdUMOO3Lu5Bdn0M=
=R6SD
-----END PGP SIGNATURE-----