Re: [Analytics] Better way to identify client IP address

23 Jan 2015

Hi Christian,

On Thu, Jan 22, 2015 at 10:48 PM, &lt;christian(a)quelltextlich.at&gt; wrote:

...

  Okay. Assuming we append client_ip to
X-Forwarded-For and process the  list
  from right-to-left, do you still think we should
explicitly check whether
 client_ip is a trusted proxy? Because, per the modified algorithm, the
 client_ip will be the first item to be processed and if it is not a  trusted
  proxy, it will be returned as the result - which
is what even the  explicit
  initial check will result in this case as well.
Correct? 
 Right, if you implement the pseudo-code algorithm I sketched, you can
 skip that step.

 But if you optimize and loop-unroll the first iteration (like your
 implementation did), you want to check whether or not it's a trusted
 proxy.
 But whether one would label that loop-unrolling “premature
 optimization” or “making the code more readable” depends a bit on the
 concrete formulation in code.

Okay.

...

  > The way you combine the looping and if looks
like you might be
 > skipping across entries that you cannot parse.
 > But if you find invalid entries while backtracking the IPs, you're
 > probably in parts of the X-Forwarded-For value that you shouldn't  trust.

  (X-Forwarded-For can easily be spoofed by
clients) 
 We only check that each entry is a valid IPv4 or IPv6 address and not  just
  a random string. 
  If we find invalid entries, then the only option
we have
 is to continue the search further for a valid IP. 
 Ex falso quodlibet.

(Oh Latin! How much I missed you!)

...
  Hence, the only option one has is to stop at untrusted
values in the
 chain of X-Forwarded-For IPs.

And pick the last known good IP?

...

list_of_ips = append client_ip to X-Forwarded-For
   for ip in reverse( list_of_ips )
   do
       if ip is not a trusted proxy or the iterator does not have more  elements
        then
           return ip
       fi
   done 
  (That should give invalid IP addresses for some
requests. That
 seems to be the correct thing from my point of view. But if you rather
 geolocate wrong than not geolocate at all, you can choose to pick the
 last known good IP address instead. I guess that's more a matter of
 taste.) 
 What would picking the last known good IP address in this context? An IP
 that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy? 
 As indicated above, I'd rather indicate “Unable to geolocate” than
 knowingly geolocate wrong.

 But by the “last known good IP address” above I meant (in case of
 faulty X-Forwarded-For), the last IP address that was not obviously
 wrong could be picked.

 Let's have some examples.
 For the examples, I am assuming that only 1.0.0.0/8 are trusted
 proxies.

 * Case 1:

   Client IP: 2.0.0.1
   X-Forwarded-For: 1.0.0.2, 1.0.0.3

   -> last known good IP address: “2.0.0.1”

 * Case 2:

   Client IP: 1.0.0.1
   X-Forwarded-For: 1.0.0.2, 2.0.0.1

   -> last known good IP address: “2.0.0.1”

 * Case 3:

   Client IP: 1.0.0.1
   X-Forwarded-For: foo, 2.0.0.1

   -> last known good IP address: “2.0.0.1”

 * Case 4:

   Client IP: 1.0.0.1
   X-Forwarded-For: 1.0.0.2, 1.0.0.3, foo

   -> last known good IP address: “1.0.0.1”

 * Case 5:

   Client IP: 1.0.0.1
   X-Forwarded-For: 1.0.0.2, foo, 1.0.0.3

   -> last known good IP address: “1.0.0.3”

 * Case 6:

   Client IP: 1.0.0.1
   X-Forwarded-For: foo, 1.0.0.2, 1.0.0.3

   -> last known good IP address: “1.0.0.2”

 * Case 7:

   Client IP: foo

   -> This should never happen. File a bug in phabricator.

Thanks for the case-by-case explanation. It is *really* clear.

...

 But let me repeat that I think “Unable to geolocate” is a valid and
 good response. Better than using some proxy in the chain.

Hm...Let me check with Andrew on whether it is an acceptable behavior for
the UDF.

Thanks,
Ananth

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Better way to identify client IP address