This is awesome, thanks Erik!
In conjunction, there are similar ganglia metrics of these numbers on each of the udp2log
boxes. Example:
http://bit.ly/13pkK3e
You can see a similar breakdown of packet_loss_average per role. These roles are defined
by the pybal config:
http://noc.wikimedia.org/pybal/
The packet_loss_average metric is sampled at a 1/10 level instead of 1/1000, so it will be
slightly more accurate. However, these metrics don't weight anything, so if there is
any loss from a role that has very few requests, the average will be skewed.
Having both of these available for troubleshooting is very useful.
Thanks again!
-Ao
On Jul 29, 2013, at 10:39 AM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
Hi all,
Over the years we've had several serious issues with huge underreporting on page view
data due to message loss on udp2log.
There are now several diagnostic tools: alerts are sent and there is real-time
monitoringhttp://tinyurl.com/kqmtfss
But none of those help to quantify total monthly loss.
I upgraded an existing csv file to html report, to be updated monthly.
http://stats.wikimedia.org/wikimedia/squids/SquidDataMonthlyPerSquidSet.htm
This reports show total monthly message loss as a percentage, plus a breakdown of message
loss and traffic volume by server role and location.
Basic idea behind the report is that as we use 1:1000 sampling, for each squid server we
should find sequence numbers between logged messages to be 1000 apart, on average.
If we actually find they are 1050 apart that translates into 4.7% data loss.
On how this is calculated
seehttp://stats.wikimedia.org/wikimedia/squids/SquidDataMonthlyPerSquidSet.…
I use a weighted average for calculating total percentage data loss, taking into account
data volume per server cluster, and ignoring servers where the sequence number mechanism
is still broken (ssl servers).
Role and implementation of udp2log are in flux. But in any setup it would be good to have
such overall assessment of loss.
Cheers,
Erik
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics