tldr; Do we have data on the number of compressed vs. uncompressed requests
we serve?
Hey all,
I'm investigating a fundraising issue where it appears that banners that
should be about the same size compressed, but which are differently sized
upon decompression, show markedly different conversion rates (the only
thing that's different about them is the name of the banner which affects
content length).
One of the angles I'm investigating is if perhaps we're serving a
significant number of banners uncompressed; which would affect the amount
of time it takes to appear on the site. If we have this data already, I can
compare it to data that I'm going to take from the banner stream [1].
Alternate things I'm considering is if it takes the caching layer longer to
retrieve serve certain banner content and/or cache keys.
-- The Data --
For the truly curious; the two tests I've run so far that have led me down
this path are: have two banners with the same content (cloned) but with
different names. As names get substituted into the banners multiple times
through keyword expansion the content lengths will be different. See how
many clicks each banner gets. This is multivariate with the two variables
being content length, and cache key.
Cache key setup 1 (Long name has a worse spot in the cache):
Short Name: 0.22% success rate (155300 samples)
Long Name: 0.19% success rate (160800 samples)
The 95% confidence interval has the long name performing from -31% to 3%
worse than the short name with a power of 0.014.
Cache key setup 2 (Long name has a better spot in the cache):
Short Name: 0.20% success rate (294900 samples)
Long Name: 0.19% success rate (309500 samples)
The 95% CI here still has the long name performing worse; but with power
that is effectively not useful.
[1]
https://gerrit.wikimedia.org/r/#/c/90667/
~Matt Walker
Wikimedia Foundation
Fundraising Technology Team