Hi,
On Thu, Dec 11, 2014 at 06:27:02PM -0500, Oliver Keyes wrote:
On Sun, Dec 07, 2014 at 12:59:27PM +0100, Christian Aistleitner wrote:
[...] I'm not sure how to interpret the pybal,
The exemplary file linked above holds lines like
{ 'host': 'amssq36.esams.wmnet', 'weight': 1, 'enabled': True }
Such a line means:
The host 'amssq36.esams.wmnet' [1] is [2] an SSL terminator for text cluster in esams [3], and has weight 1 [4].
Essentially; we want to be excluding internal IP spaces, because that contains a lot of automatically-generated traffic (fundraising, I'm looking at you)
Oliver, I do not like blaming games. You blamed Fundraising before to cause lots of internal requests. And I called you out on that before to please provide an example. However, you failed to provide an example. And yet you call out Fundraising again.
Please provide an example [5] of such traffic, so we're all on the same page.
So, we exclude all requests from IPs within our ranges. Except, then we also exclude all the SSL traffic, since that will appear to come from an internal IP address, from the point of view of the request logs.
So, do I interpret this pybal as: if it's tagged as HTTPS,
Since you use 'tag' in different contexts around https, let me clarify how I read 'tag' here. I read it as “If a pybal *-https file lists a host as enabled with positive weight in a line that is not commented out"
it's an SSL terminator, [...]
Yes.
[...] and so requests from those machines, from internal IP addresses, should be included?
In the end “should be included” is something you have to decide.
But if you see a request, whose ip column comes from a machine whose corresponding name has been listed in a pybal *-https file while the request was processed, it “typically” is a relayed request from the SSL terminator.
(Note the distinction between my “typcially is a relayed request from the SSL terminator” and your “should be included”.)
Or: those are the SSL machines, find out their IP addresses and you find out the internal IPs that represent SSLd requests, rather than internally-generated traffic?
I cannot fully parse that sentence. But it sounds a bit like SSL traffic would not be internally-generated traffic. From the logging perspective, SSL traffic is internally-generated traffic:
The SSL terminator performs a separate, genuinely fresh and new request to the caches.
This separate, genuinely fresh and new request gets logged. And that's the log line you're after, if you want to look at https traffic from within Hive.
Have fun, Christian
[1] 'host' field
[2] 'enabled' field
[3] see URL
[4] 'weight' field. You probably need not care about the weight. The weight tells you how much of the overall traffic a node gets. In the given file, all hosts have weight 1, so they all get a similar sized part of the overall traffic.
[5] Either anonymized on-list, or else for example through a command that we can run on stat1002.