Thanks for the detailed response.
> That looks like (if X-Forwarded-For is “valid”) your jumping straight
> to considering the X-Forwarded-For header. You might want to check
> beforehand whether the client ip is a “trusted” proxy. And only if it
> is, you should walk the X-Forwarded-For.
> Also, it sounds a bit like you're going through X-Forwarded-For from
> left to right. Make sure to walk it from right to left, as proxies are
> expected to append (not prepend) the client IP they forward for.
Okay. Assuming we append client_ip to X-Forwarded-For and process the list
from right-to-left, do you still think we should explicitly check whether
client_ip is a trusted proxy? Because, per the modified algorithm, the
client_ip will be the first item to be processed and if it is not a trusted
proxy, it will be returned as the result - which is what even the explicit
initial check will result in this case as well. Correct?
> The way you combine the looping and if looks like you might be
> skipping across entries that you cannot parse.
> But if you find invalid entries while backtracking the IPs, you're
> probably in parts of the X-Forwarded-For value that you shouldn't trust.
> (X-Forwarded-For can easily be spoofed by clients)
We only check that each entry is a valid IPv4 or IPv6 address and not just
a random string. If we find invalid entries, then the only option we have
is to continue the search further for a valid IP.
> list_of_ips = append client_ip to X-Forwarded-For
> for ip in reverse( list_of_ips )
> do
> if ip is not a trusted proxy or the iterator does not have more
elements
> then
> return ip
> fi
> done
> (That should give invalid IP addresses for some requests. That
> seems to be the correct thing from my point of view. But if you rather
> geolocate wrong than not geolocate at all, you can choose to pick the
> last known good IP address instead. I guess that's more a matter of
> taste.)
What would picking the last known good IP address in this context? An IP
that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy?
> Different stakeholders at the WMF use different lists of what kind of
> proxies they consider. So things are not really clear cut.
Compiling a common list - may be union of "General", "MediaWiki" and
"Wikipedia Zero" - would be very useful. As Oliver suggests, it would be
good to have them in site-wide config files.
Thanks,
Ananth