Thanks Sean!
On Jun 9, 2014, at 3:56 AM, Sean Pringle springle@wikimedia.org wrote:
Hi!
A bit more info:
Being the earliest Opsen around I poked around on analytics1021 and 1022 (the brokers) and found a disk failure for /dev/sdf on analytics1021, along with corresponding java call stack in the log when the broker died due to the fs remounting as read-only.
I unmounted the disk and found more than a simple fsck is required. I therefore disabled puppet to avoid the endless broker service restart loop, and to avoid filling up /.
Faidon silenced the Icinga noise with a patch.
The problems are at least two fold:
Only 1 of 2 brokers alive evidently isn't quite enough capacity. Jgage mentioned on IRC that additional capacity is planned.
Ori observed: < ori> presumably the alert is flapping because because the script manages to poll twice between flushes, in which case drerr has not gone up
Sean _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics