On Sat, Oct 19, 2013 at 6:24 AM, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi Matt,

On Fri, Oct 18, 2013 at 07:01:40PM -0700, Matthew Walker wrote:
> tldr; Do we have data on the number of compressed vs. uncompressed requests
> we serve?

Hoping that others chime in on this, as I could not find such data.
AFAIK, this is the first time that we get this data request. 

As you suggested in IRC that you could use the output of
> https://gerrit.wikimedia.org/r/#/c/90667/
and compare sizes to determine whether or not the response was
compressed or not, I did some random checks:

* Comparing existing logs with Content-Lengths for Content-Encoding
  gzip and unencoded responses, field 7 of [1] indeed typically
  matches either of those lengths. So it does not hold the
  uncompressed length. I updated [1] accordingly.
Thanks Christian! Nice detective work.


* For a few random pages I checked our existing logs, and it seems
  that for <5% requests field 7 of [1] matches the uncompressed
  length. For >90% it matches the gzipped length. I know it's
  completely unrepresentative, but I hope it helps for
  feasibility/order of magnitude computations.
It would be useful to have a table with the source of the webrequest logline (varnish, squid, nginx) and whether the logline is compressed or not. As it seems that <5% is affected it could be that the Nginx (SSL and IP6 traffic) is configured slightly different then the Varnish and Squid servers but this is a hypothesis that should be checked. We can also check this by mimetype if the first hypothesis has inconclusive results.

D


[1] https://wikitech.wikimedia.org/wiki/Cache_log_format


--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a        Email:  christian@quelltextlich.at
4040 Linz, Austria           Phone:          +43 732 / 26 95 63
                             Fax:            +43 732 / 26 95 63
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics