Hi yalls,
Christian and I were talking a bit today about how to figure out why high traffic (bits and upload) esams varnishes occasionally have latency issues[1] which cause buffers to fill up which causes a small amount of message loss[2]. We aren’t totally sure if this is an overall system throughput issue (network and/or Kafka brokers), or just something that might be fixable by tweaking more configs on the individual varnishkafkas.
We don’t think that anyone is using the webrequest bits data. It isn’t included in udp2log, and therefore doesn’t affect any legacy analytics. Nor are we using it for any productionized analytics in Hadoop. We aren’t sure if others are relying on this data for adhoc analysis though.
If no one objects, we’d like to temporarily disable the bits varnishkafka instances. If we do and the uploads esams varnishes then stop having problems, we will know that this is a system throughput issue. If we continue to have problems with esams uploads, then we will know that it is more likely a local varnishkafka issue.
So, are there any objections to removing webrequest bits from Kafka webrequest logs for a little while?
-Ao
[1] varnishkafka rtt average: http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to=... http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to=1420571372895&panelId=10&fullscreen or http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg%5B... http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg%5B%5D=%28amssq%7Ccp%29.%2B&mreg%5B%5D=kafka.rdkafka.brokers..%2B%5C.rtt%5C.avg>ype=line&title=kafka.rdkafka.brokers..%2B%5C.rtt%5C.avg&aggregate=1
[2] varnishkafka delivery errors: http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to=... http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to=1420571372895&panelId=9&fullscreen or http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg%5B... http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg%5B%5D=%28amssq%7Ccp%29.%2B&mreg%5B%5D=kafka.varnishkafka%5C.kafka_drerr.per_second>ype=line&title=kafka.varnishkafka%5C.kafka_drerr.per_second&aggregate=1