Hi Andrew,
On Fri, Dec 12, 2014 at 09:41:11AM -0500, Andrew Otto wrote:
There must be some way to tag traffic as https or not from at the nginx or varnish level, no? Has anyone looked into this?
Yes. On the mobile caches, varnish adds a https=1 tag to the X-Analytics field [1].
But as nice and easy Varnish tagging looks on the outside, Varnish tagging has burned us many times in many different ways around Wikipedia Zero. The fact that we cannot run written logs through VCL logic again is a deal breaker.
So assume we extend the above https=1 Varnish tagging to bits, text, and upload too. Then we build analytics machinery relying on those tags. That is nice and shiny until varnish tagging breaks for the first time (and it will break for sure). Typically, we won't notice immediately, but only some time afterwards. Say two days after it happened. How would we re-process the data for those two days?
I do not know of a way to automatically pass our written logs through the VCL tagging machinery again. Hence, (to make up for the mistagging of those two days) we'd have to re-implement the Varnish logic in the cluster and re-tag all log lines somewhere on the cluster.
So at the end of the day, we: * Have implemented https tagging logic in Varnish. * Have implemented https tagging logic in the cluster. * Maybe have to keep those two implementations in sync. * Are scared of Varnish's https tagging breaking again (at least I would be).
We can remove 3 of those 4 items, if we implement https tagging in the cluster right away. We cannot escape it, if we want good data. And it removes so much pressure.
Have fun, Christian
P.S.: How we implement https tagging in the cluster is up for discussion.
Detecting IPs has good (not perfect) quality and is pretty robust against misconfigurations on the pipeline. We can do that as of today.
An alternative might be to start tracking X-Forwarded-Proto, which would be way simpler than the IP approach. But it has good quality too and is way more robust than X-Analytics. But that would need more research, and would require us again to add a column to the logging format (which last time made the table explode).
[1] See row “https” in
https://wikitech.wikimedia.org/wiki/X-Analytics