Hi yalls,
Christian and I were talking a bit today about how to figure out why high traffic (bits
and upload) esams varnishes occasionally have latency issues[1] which cause buffers to
fill up which causes a small amount of message loss[2]. We aren’t totally sure if this is
an overall system throughput issue (network and/or Kafka brokers), or just something that
might be fixable by tweaking more configs on the individual varnishkafkas.
We don’t think that anyone is using the webrequest bits data. It isn’t included in
udp2log, and therefore doesn’t affect any legacy analytics. Nor are we using it for any
productionized analytics in Hadoop. We aren’t sure if others are relying on this data for
adhoc analysis though.
If no one objects, we’d like to temporarily disable the bits varnishkafka instances. If
we do and the uploads esams varnishes then stop having problems, we will know that this is
a system throughput issue. If we continue to have problems with esams uploads, then we
will know that it is more likely a local varnishkafka issue.
So, are there any objections to removing webrequest bits from Kafka webrequest logs for a
little while?
-Ao
[1] varnishkafka rtt average:
http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to…
<http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to=1420571372895&panelId=10&fullscreen>
or
http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg[]…
<http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg%5B%5D=%28amssq%7Ccp%29.%2B&mreg%5B%5D=kafka.rdkafka.brokers..%2B%5C.rtt%5C.avg>ype=line&title=kafka.rdkafka.brokers..%2B%5C.rtt%5C.avg&aggregate=1>
[2] varnishkafka delivery errors:
http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to…
<http://grafana.wikimedia.org/#/dashboard/db/kafka?from=1420484972895&to=1420571372895&panelId=9&fullscreen>
or
http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg[]…
<http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg%5B%5D=%28amssq%7Ccp%29.%2B&mreg%5B%5D=kafka.varnishkafka%5C.kafka_drerr.per_second>ype=line&title=kafka.varnishkafka%5C.kafka_drerr.per_second&aggregate=1>