tldr; Do we have data on the number of compressed vs. uncompressed requests we serve?
Hey all,
I'm investigating a fundraising issue where it appears that banners that should be about the same size compressed, but which are differently sized upon decompression, show markedly different conversion rates (the only thing that's different about them is the name of the banner which affects content length).
One of the angles I'm investigating is if perhaps we're serving a significant number of banners uncompressed; which would affect the amount of time it takes to appear on the site. If we have this data already, I can compare it to data that I'm going to take from the banner stream [1].
Alternate things I'm considering is if it takes the caching layer longer to retrieve serve certain banner content and/or cache keys.
-- The Data -- For the truly curious; the two tests I've run so far that have led me down this path are: have two banners with the same content (cloned) but with different names. As names get substituted into the banners multiple times through keyword expansion the content lengths will be different. See how many clicks each banner gets. This is multivariate with the two variables being content length, and cache key.
Cache key setup 1 (Long name has a worse spot in the cache): Short Name: 0.22% success rate (155300 samples) Long Name: 0.19% success rate (160800 samples)
The 95% confidence interval has the long name performing from -31% to 3% worse than the short name with a power of 0.014.
Cache key setup 2 (Long name has a better spot in the cache): Short Name: 0.20% success rate (294900 samples) Long Name: 0.19% success rate (309500 samples)
The 95% CI here still has the long name performing worse; but with power that is effectively not useful.
[1] https://gerrit.wikimedia.org/r/#/c/90667/
~Matt Walker Wikimedia Foundation Fundraising Technology Team
Hi Matt,
On Fri, Oct 18, 2013 at 07:01:40PM -0700, Matthew Walker wrote:
tldr; Do we have data on the number of compressed vs. uncompressed requests we serve?
Hoping that others chime in on this, as I could not find such data.
As you suggested in IRC that you could use the output of
and compare sizes to determine whether or not the response was compressed or not, I did some random checks:
* Comparing existing logs with Content-Lengths for Content-Encoding gzip and unencoded responses, field 7 of [1] indeed typically matches either of those lengths. So it does not hold the uncompressed length. I updated [1] accordingly.
* For a few random pages I checked our existing logs, and it seems that for <5% requests field 7 of [1] matches the uncompressed length. For >90% it matches the gzipped length. I know it's completely unrepresentative, but I hope it helps for feasibility/order of magnitude computations.
Best regards, Christian
[1] https://wikitech.wikimedia.org/wiki/Cache_log_format
Hi Matt,
On Sat, Oct 19, 2013 at 12:24:13PM +0200, Christian Aistleitner wrote:
- Comparing existing logs with Content-Lengths for Content-Encoding gzip and unencoded responses, field 7 of [1] indeed typically matches either of those lengths. So it does not hold the uncompressed length. I updated [1] accordingly.
What a confusing paragraph ... let me try again.
* Comparing existing logs with current Content-Lengths for Content-Encoding gzip and unencoded responses, field 7 of [1] indeed typically matches the length of one of the two. So field 7 of [1] does not seem bound to hold the uncompressed length. I updated [1] accordingly.
Have fun, Christian
On Sat, Oct 19, 2013 at 6:24 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Matt,
On Fri, Oct 18, 2013 at 07:01:40PM -0700, Matthew Walker wrote:
tldr; Do we have data on the number of compressed vs. uncompressed
requests
we serve?
Hoping that others chime in on this, as I could not find such data.
AFAIK, this is the first time that we get this data request.
As you suggested in IRC that you could use the output of
and compare sizes to determine whether or not the response was compressed or not, I did some random checks:
- Comparing existing logs with Content-Lengths for Content-Encoding gzip and unencoded responses, field 7 of [1] indeed typically matches either of those lengths. So it does not hold the uncompressed length. I updated [1] accordingly.
Thanks Christian! Nice detective work.
- For a few random pages I checked our existing logs, and it seems that for <5% requests field 7 of [1] matches the uncompressed length. For >90% it matches the gzipped length. I know it's completely unrepresentative, but I hope it helps for feasibility/order of magnitude computations.
It would be useful to have a table with the source of the webrequest logline (varnish, squid, nginx) and whether the logline is compressed or not. As it seems that <5% is affected it could be that the Nginx (SSL and IP6 traffic) is configured slightly different then the Varnish and Squid servers but this is a hypothesis that should be checked. We can also check this by mimetype if the first hypothesis has inconclusive results.
D
[1] https://wikitech.wikimedia.org/wiki/Cache_log_format
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics