Hm...Let me check with Andrew on whether it is an acceptable behavior for the UDF.
On Jan 23, 2015, at 04:15, Ananth RK <ananthrk@ymxdata.com> wrote:Hi Christian,_______________________________________________On Thu, Jan 22, 2015 at 10:48 PM, <christian@quelltextlich.at> wrote:
> Okay. Assuming we append client_ip to X-Forwarded-For and process the list
> from right-to-left, do you still think we should explicitly check whether
> client_ip is a trusted proxy? Because, per the modified algorithm, the
> client_ip will be the first item to be processed and if it is not a trusted
> proxy, it will be returned as the result - which is what even the explicit
> initial check will result in this case as well. Correct?
Right, if you implement the pseudo-code algorithm I sketched, you can
skip that step.
But if you optimize and loop-unroll the first iteration (like your
implementation did), you want to check whether or not it's a trusted
proxy.
But whether one would label that loop-unrolling “premature
optimization” or “making the code more readable” depends a bit on the
concrete formulation in code.Okay.
> > The way you combine the looping and if looks like you might be
> > skipping across entries that you cannot parse.
> > But if you find invalid entries while backtracking the IPs, you're
> > probably in parts of the X-Forwarded-For value that you shouldn't trust.
>
> > (X-Forwarded-For can easily be spoofed by clients)
>
> We only check that each entry is a valid IPv4 or IPv6 address and not just
> a random string.
> If we find invalid entries, then the only option we have
> is to continue the search further for a valid IP.
Ex falso quodlibet.(Oh Latin! How much I missed you!)
Hence, the only option one has is to stop at untrusted values in the
chain of X-Forwarded-For IPs.And pick the last known good IP?
> > list_of_ips = append client_ip to X-Forwarded-For
> > for ip in reverse( list_of_ips )
> > do
> > if ip is not a trusted proxy or the iterator does not have more
> elements
> > then
> > return ip
> > fi
> > done
>
> > (That should give invalid IP addresses for some requests. That
> > seems to be the correct thing from my point of view. But if you rather
> > geolocate wrong than not geolocate at all, you can choose to pick the
> > last known good IP address instead. I guess that's more a matter of
> > taste.)
>
> What would picking the last known good IP address in this context? An IP
> that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy?
As indicated above, I'd rather indicate “Unable to geolocate” than
knowingly geolocate wrong.
But by the “last known good IP address” above I meant (in case of
faulty X-Forwarded-For), the last IP address that was not obviously
wrong could be picked.
Let's have some examples.
For the examples, I am assuming that only 1.0.0.0/8 are trusted
proxies.
* Case 1:
Client IP: 2.0.0.1
X-Forwarded-For: 1.0.0.2, 1.0.0.3
-> last known good IP address: “2.0.0.1”
* Case 2:
Client IP: 1.0.0.1
X-Forwarded-For: 1.0.0.2, 2.0.0.1
-> last known good IP address: “2.0.0.1”
* Case 3:
Client IP: 1.0.0.1
X-Forwarded-For: foo, 2.0.0.1
-> last known good IP address: “2.0.0.1”
* Case 4:
Client IP: 1.0.0.1
X-Forwarded-For: 1.0.0.2, 1.0.0.3, foo
-> last known good IP address: “1.0.0.1”
* Case 5:
Client IP: 1.0.0.1
X-Forwarded-For: 1.0.0.2, foo, 1.0.0.3
-> last known good IP address: “1.0.0.3”
* Case 6:
Client IP: 1.0.0.1
X-Forwarded-For: foo, 1.0.0.2, 1.0.0.3
-> last known good IP address: “1.0.0.2”
* Case 7:
Client IP: foo
-> This should never happen. File a bug in phabricator.Thanks for the case-by-case explanation. It is *really* clear.
But let me repeat that I think “Unable to geolocate” is a valid and
good response. Better than using some proxy in the chain.Hm...Let me check with Andrew on whether it is an acceptable behavior for the UDF.Thanks,Ananth
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics