Hi!

A bit more info:

Being the earliest Opsen around I poked around on analytics1021 and 1022 (the brokers) and found a disk failure for /dev/sdf on analytics1021, along with corresponding java call stack in the log when the broker died due to the fs remounting as read-only.

I unmounted the disk and found more than a simple fsck is required. I therefore disabled puppet to avoid the endless broker service restart loop, and to avoid filling up /.

Faidon silenced the Icinga noise with a patch.

The problems are at least two fold:

1. Only 1 of 2 brokers alive evidently isn't quite enough capacity. Jgage mentioned on IRC that additional capacity is planned.

2. Ori observed: < ori> presumably the alert is flapping because because the script manages to poll twice between flushes, in which case drerr has not gone up

Sean