On Thu, Nov 26, 2015 at 10:46 PM, Ori Livneh ori@wikimedia.org wrote:
Seems that eventlog1001 has not received any events since 01:30 UTC on Thursday
http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Misce...
This is pretty severe; I'd page if it wasn't a US holiday.
Kafka clients on eventlog1001 were in a "Autocommitting consumer offset" death-loop and not receiving any events from the Kafka brokers. I ran eventloggingctl stop / eventloggingctl start and they recovered. Needs to be investigated more thoroughly. Otto, can you follow up?