There definitely is - it's done for mobile, for example - and Christian and I discussed it when I was experimenting with the sampled logs - but I can't find the thread right now. Bah :/

On 12 December 2014 at 09:41, Andrew Otto <aotto@wikimedia.org> wrote:
There must be some way to tag traffic as https or not from at the nginx or varnish level, no?  Has anyone looked into this?


On Dec 11, 2014, at 18:27, Oliver Keyes <okeyes@wikimedia.org> wrote:



On 11 December 2014 at 11:52, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi Oliver,

On Wed, Dec 10, 2014 at 08:22:18PM -0500, Oliver Keyes wrote:
> So, we've had conversations about detecting SSL terminators, for two
> reasons:
> [...]
> So: what's the right approach? How do we find these things easily and
> automagically.

The “right” approach depends a bit on the stream that you're looking
at. But I figure you're mostly interested in Hive data (for different
streams, there are other methods).

More or less the same question got asked on the internal list on
Sunday. There I pointed towards pybal:

On Sun, Dec 07, 2014 at 12:59:27PM +0100, Christian Aistleitner wrote:
> Hi,
>
> On Fri, Dec 05, 2014 at 03:23:45PM -0600, Aaron Halfaker wrote:
> > And wrote up some
> > brief notes in http://etherpad.wikimedia.org/p/ssl_terminators
>
> In that etherpad you wrote:
>
> Etherpad> * Scan through: https://github.com/wikimedia/operations-puppet/blob/production/manifests/site.pp
> Etherpad> * Look for anything with role::cache::*
>
> [...]
>
> If you want even less puppet munging, and a more robust format, you
> can instead go to pybal directly.
>
>   http://config-master.wikimedia.org/pybal/
>
> . For example
>
>   http://config-master.wikimedia.org/pybal/esams/text-https

I think that still holds true.

Does that approach not work, or are you just trying to get the
response to the public list? ;-)

If it's the former, please let me know where you think this approach
is failing.

If it's the latter ... yay for using the public list! ... here you
go. It's on the public list :-D


"yes" :D. I want to make these conversations public, and for us to bias more towards using the public list - but there was also a point of confusion on how we detected these machines, using puppet. If pybal clarifies it, yay!

I'm not sure how to interpret the pybal, but that's probably because my explanation of the problem was tremendously unclear. Essentially; we want to be excluding internal IP spaces, because that contains a lot of automatically-generated traffic (fundraising, I'm looking at you). So, we exclude all requests from IPs within our ranges. Except, then we also exclude all the SSL traffic, since that will appear to come from an internal IP address, from the point of view of the request logs.

So, do I interpret this pybal as: if it's tagged as HTTPS, it's an SSL terminator, and so requests from those machines, from internal IP addresses, should be included? Or: those are the SSL machines, find out their IP addresses and you find out the internal IPs that represent SSLd requests, rather than internally-generated traffic?

 
Have fun,
Christian


--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Oliver Keyes
Research Analyst
Wikimedia Foundation