Gilles Dubuc, 12/01/2015 14:23:
Federico, regarding geography and file types, the sample sizes I've been looking at for the main phenomenons shouldn't be affected by that, as the size, type and geo mix over a large amount of requests and a long period of time should be fairly consistent. Of course what matters is having large sample sizes.
I'm not an expert in statistics, but this only matters if the two things you're observing are not correlated. As an extreme example, imagine all the varnish misses are huge GIFs used on rare articles, which for some reason get "out of varnish": then comparing them to a more balanced dataset of smaller images wouldn't give us useful information. There is certainly a reason those images are varnish misses, it's not a random thing, so IMHO you can't assume the samples are unbiased and representative. Or in other words, looking for common patterns in those varnish misses could offer more specific hints for optimisation.
Nemo