Hi Andrew,
On Fri, Dec 12, 2014 at 09:41:11AM -0500, Andrew Otto wrote:
There must be some way to tag traffic as https or not
from at the
nginx or varnish level, no? Has anyone looked into this?
Yes. On the mobile caches, varnish adds a https=1 tag to the
X-Analytics field [1].
But as nice and easy Varnish tagging looks on the outside, Varnish
tagging has burned us many times in many different ways around
Wikipedia Zero.
The fact that we cannot run written logs through VCL logic again is a
deal breaker.
So assume we extend the above https=1 Varnish tagging to bits, text,
and upload too.
Then we build analytics machinery relying on those tags.
That is nice and shiny until varnish tagging breaks for the first
time (and it will break for sure). Typically, we won't notice
immediately, but only some time afterwards.
Say two days after it happened.
How would we re-process the data for those two days?
I do not know of a way to automatically pass our written logs through
the VCL tagging machinery again. Hence, (to make up for the mistagging
of those two days) we'd have to re-implement the Varnish logic in the
cluster and re-tag all log lines somewhere on the cluster.
So at the end of the day, we:
* Have implemented https tagging logic in Varnish.
* Have implemented https tagging logic in the cluster.
* Maybe have to keep those two implementations in sync.
* Are scared of Varnish's https tagging breaking again (at least I would be).
We can remove 3 of those 4 items, if we implement https tagging in the
cluster right away. We cannot escape it, if we want good data. And it
removes so much pressure.
Have fun,
Christian
P.S.: How we implement https tagging in the cluster is up for
discussion.
Detecting IPs has good (not perfect) quality and is pretty robust
against misconfigurations on the pipeline.
We can do that as of today.
An alternative might be to start tracking X-Forwarded-Proto, which
would be way simpler than the IP approach. But it has good quality too
and is way more robust than X-Analytics. But that would need more
research, and would require us again to add a column to the logging
format (which last time made the table explode).
[1] See row “https” in
https://wikitech.wikimedia.org/wiki/X-Analytics
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------