There must be some way to tag traffic as https or not from at the nginx or varnish level, no? Has anyone looked into this?
On Dec 11, 2014, at 18:27, Oliver Keyes okeyes@wikimedia.org wrote:
On 11 December 2014 at 11:52, Christian Aistleitner <christian@quelltextlich.at mailto:christian@quelltextlich.at> wrote: Hi Oliver,
On Wed, Dec 10, 2014 at 08:22:18PM -0500, Oliver Keyes wrote:
So, we've had conversations about detecting SSL terminators, for two reasons: [...] So: what's the right approach? How do we find these things easily and automagically.
The “right” approach depends a bit on the stream that you're looking at. But I figure you're mostly interested in Hive data (for different streams, there are other methods).
More or less the same question got asked on the internal list on Sunday. There I pointed towards pybal:
On Sun, Dec 07, 2014 at 12:59:27PM +0100, Christian Aistleitner wrote:
Hi,
On Fri, Dec 05, 2014 at 03:23:45PM -0600, Aaron Halfaker wrote:
And wrote up some brief notes in http://etherpad.wikimedia.org/p/ssl_terminators http://etherpad.wikimedia.org/p/ssl_terminators
In that etherpad you wrote:
Etherpad> * Scan through: https://github.com/wikimedia/operations-puppet/blob/production/manifests/sit... https://github.com/wikimedia/operations-puppet/blob/production/manifests/site.pp Etherpad> * Look for anything with role::cache::*
[...]
If you want even less puppet munging, and a more robust format, you can instead go to pybal directly.
http://config-master.wikimedia.org/pybal/ http://config-master.wikimedia.org/pybal/
. For example
http://config-master.wikimedia.org/pybal/esams/text-https http://config-master.wikimedia.org/pybal/esams/text-https
I think that still holds true.
Does that approach not work, or are you just trying to get the response to the public list? ;-)
If it's the former, please let me know where you think this approach is failing.
If it's the latter ... yay for using the public list! ... here you go. It's on the public list :-D
"yes" :D. I want to make these conversations public, and for us to bias more towards using the public list - but there was also a point of confusion on how we detected these machines, using puppet. If pybal clarifies it, yay!
I'm not sure how to interpret the pybal, but that's probably because my explanation of the problem was tremendously unclear. Essentially; we want to be excluding internal IP spaces, because that contains a lot of automatically-generated traffic (fundraising, I'm looking at you). So, we exclude all requests from IPs within our ranges. Except, then we also exclude all the SSL traffic, since that will appear to come from an internal IP address, from the point of view of the request logs.
So, do I interpret this pybal as: if it's tagged as HTTPS, it's an SSL terminator, and so requests from those machines, from internal IP addresses, should be included? Or: those are the SSL machines, find out their IP addresses and you find out the internal IPs that represent SSLd requests, rather than internally-generated traffic?
Have fun, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at mailto:christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 tel:%2B43%207946%20%2F%2020%205%2081 Fax: +43 7946 / 20 5 81 tel:%2B43%207946%20%2F%2020%205%2081 Homepage: http://quelltextlich.at/ http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics