New subject: Monthly data loss on udp2log quantified in new report

29 Jul 2013

Hi all,

Over the years we've had several serious issues with huge underreporting on
page view data due to message loss on udp2log.

There are now several diagnostic tools: alerts are sent and there is
real-time monitoring http://tinyurl.com/kqmtfss

But none of those help to quantify total monthly loss.

I upgraded an existing csv file to html report, to be updated monthly.

http://stats.wikimedia.org/wikimedia/squids/SquidDataMonthlyPerSquidSet.htm

This reports show total monthly message loss as a percentage, plus a
breakdown of message loss and traffic volume by server role and location.

Basic idea behind the report is that as we use 1:1000 sampling, for each
squid server we should find sequence numbers between logged messages to be
1000 apart, on average.

If we actually find they are 1050 apart that translates into 4.7% data loss.

On how this is calculated see
http://stats.wikimedia.org/wikimedia/squids/SquidDataMonthlyPerSquidSet.htm#
calc

I use a weighted average for calculating total percentage data loss, taking
into account data volume per server cluster, and ignoring servers where the
sequence number mechanism is still broken (ssl servers).

Role and implementation of udp2log are in flux. But in any setup it would be
good to have such overall assessment of loss.

Cheers,

Erik