Gilles Dubuc, 12/01/2015 14:23:
Federico, regarding geography and file types, the
sample sizes I've been
looking at for the main phenomenons shouldn't be affected by that, as
the size, type and geo mix over a large amount of requests and a long
period of time should be fairly consistent. Of course what matters is
having large sample sizes.
I'm not an expert in statistics, but this only matters if the two things
you're observing are not correlated. As an extreme example, imagine all
the varnish misses are huge GIFs used on rare articles, which for some
reason get "out of varnish": then comparing them to a more balanced
dataset of smaller images wouldn't give us useful information. There is
certainly a reason those images are varnish misses, it's not a random
thing, so IMHO you can't assume the samples are unbiased and
representative. Or in other words, looking for common patterns in those
varnish misses could offer more specific hints for optimisation.
Nemo