Re: [Analytics] [Multimedia] Media Viewer Dashboards

20 May 2014

On Tue, May 20, 2014 at 5:21 AM, Gilles Dubuc &lt;gilles(a)wikimedia.org&gt; wrote:

...
  Unfortunately it looks like the 1:1000 sampling since
last Friday was too
 extreme and is destructive of information, even for the actions that were
 the most numerous. We knew that such a high sampling factor was going to
 destroy information for small wikis or metrics with low figures, but even
 the huge metrics in the millions have become unreliable. I'm saying that
 because multiplying even the largest figures by 1000 still doesn't give an
 estimate close to what it was before the change. Which means that the
 actions graph probably won't be fixable for the period since last Friday
 until my fixes make it through. Even compensating for the sampling (by
 multiplying the figures by 1000), the line would jump up and down every day
 for each metric.

There is a big spike every weekend in the unsampled logs as well, so the
numbers jumping around between Friday and now is not necessarily a sampling
artifact.

Still, the  sampling ratio was chosen aggressively and could be decreased
if needed:

...
  10:46 < ori> operationally i can tell you that
1:1000 and even 1:100 are
 totally fine 

Is there a "scientific" way of choosing the right sampling? Like set a
certain standard deviation we should be aiming for, and then work backwards
from that?

Nuria already said that for percentiles we want 1000 events per bucket,
which means 100.000 events daily for a 99th percentile graph (that's the
highest we have currently), we were getting ~3M duration log events a day,
so the conservative choice would be 1:10, after
which MultimediaViewerDuration logs would account for ~1% of the
EventLogging traffic.

...
 From action events, we were getting about 15M a day,
and we only use them to show total counts (daily number of clicks etc). How do we
tell when the
sampling ratio is right for that?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] [Multimedia] Media Viewer Dashboards