Hm...Let me check with Andrew on whether it is an
acceptable behavior for the UDF.
Totally fine. Thanks!
> On Jan 23, 2015, at 04:15, Ananth RK <ananthrk(a)ymxdata.com> wrote:
>
> Hi Christian,
>
> On Thu, Jan 22, 2015 at 10:48 PM, <christian(a)quelltextlich.at
<mailto:christian@quelltextlich.at>> wrote:
>
> > Okay. Assuming we append client_ip to X-Forwarded-For and process the list
> > from right-to-left, do you still think we should explicitly check whether
> > client_ip is a trusted proxy? Because, per the modified algorithm, the
> > client_ip will be the first item to be processed and if it is not a trusted
> > proxy, it will be returned as the result - which is what even the explicit
> > initial check will result in this case as well. Correct?
>
> Right, if you implement the pseudo-code algorithm I sketched, you can
> skip that step.
>
> But if you optimize and loop-unroll the first iteration (like your
> implementation did), you want to check whether or not it's a trusted
> proxy.
> But whether one would label that loop-unrolling “premature
> optimization” or “making the code more readable” depends a bit on the
> concrete formulation in code.
>
> Okay.
>
>
> > > The way you combine the looping and if looks like you might be
> > > skipping across entries that you cannot parse.
> > > But if you find invalid entries while backtracking the IPs, you're
> > > probably in parts of the X-Forwarded-For value that you shouldn't
trust.
> >
> > > (X-Forwarded-For can easily be spoofed by clients)
> >
> > We only check that each entry is a valid IPv4 or IPv6 address and not just
> > a random string.
>
> > If we find invalid entries, then the only option we have
> > is to continue the search further for a valid IP.
>
> Ex falso quodlibet.
>
> (Oh Latin! How much I missed you!)
>
>
> Hence, the only option one has is to stop at untrusted values in the
> chain of X-Forwarded-For IPs.
>
> And pick the last known good IP?
>
>
>
> > > list_of_ips = append client_ip to X-Forwarded-For
> > > for ip in reverse( list_of_ips )
> > > do
> > > if ip is not a trusted proxy or the iterator does not have more
> > elements
> > > then
> > > return ip
> > > fi
> > > done
> >
> > > (That should give invalid IP addresses for some requests. That
> > > seems to be the correct thing from my point of view. But if you rather
> > > geolocate wrong than not geolocate at all, you can choose to pick the
> > > last known good IP address instead. I guess that's more a matter of
> > > taste.)
> >
> > What would picking the last known good IP address in this context? An IP
> > that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy?
>
> As indicated above, I'd rather indicate “Unable to geolocate” than
> knowingly geolocate wrong.
>
> But by the “last known good IP address” above I meant (in case of
> faulty X-Forwarded-For), the last IP address that was not obviously
> wrong could be picked.
>
> Let's have some examples.
> For the examples, I am assuming that only 1.0.0.0/8 <http://1.0.0.0/8> are
trusted
> proxies.
>
> * Case 1:
>
> Client IP: 2.0.0.1
> X-Forwarded-For: 1.0.0.2, 1.0.0.3
>
> -> last known good IP address: “2.0.0.1”
>
> * Case 2:
>
> Client IP: 1.0.0.1
> X-Forwarded-For: 1.0.0.2, 2.0.0.1
>
> -> last known good IP address: “2.0.0.1”
>
> * Case 3:
>
> Client IP: 1.0.0.1
> X-Forwarded-For: foo, 2.0.0.1
>
> -> last known good IP address: “2.0.0.1”
>
> * Case 4:
>
> Client IP: 1.0.0.1
> X-Forwarded-For: 1.0.0.2, 1.0.0.3, foo
>
> -> last known good IP address: “1.0.0.1”
>
> * Case 5:
>
> Client IP: 1.0.0.1
> X-Forwarded-For: 1.0.0.2, foo, 1.0.0.3
>
> -> last known good IP address: “1.0.0.3”
>
> * Case 6:
>
> Client IP: 1.0.0.1
> X-Forwarded-For: foo, 1.0.0.2, 1.0.0.3
>
> -> last known good IP address: “1.0.0.2”
>
> * Case 7:
>
> Client IP: foo
>
> -> This should never happen. File a bug in phabricator.
>
> Thanks for the case-by-case explanation. It is *really* clear.
>
>
> But let me repeat that I think “Unable to geolocate” is a valid and
> good response. Better than using some proxy in the chain.
>
Hm...Let me check with Andrew on whether it is an
acceptable behavior for the UDF.
>
> Thanks,
> Ananth
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics