Thanks for the detailed response.

> That looks like (if X-Forwarded-For is “valid”) your jumping straight
> to considering the X-Forwarded-For header. You might want to check
> beforehand whether the client ip is a “trusted” proxy. And only if it
> is, you should walk the X-Forwarded-For.

> Also, it sounds a bit like you're going through X-Forwarded-For from
> left to right. Make sure to walk it from right to left, as proxies are
> expected to append (not prepend) the client IP they forward for.

Okay. Assuming we append client_ip to X-Forwarded-For and process the list from right-to-left, do you still think we should explicitly check whether client_ip is a trusted proxy? Because, per the modified algorithm, the client_ip will be the first item to be processed and if it is not a trusted proxy, it will be returned as the result - which is what even the explicit initial check will result in this case as well. Correct?


> The way you combine the looping and if looks like you might be
> skipping across entries that you cannot parse.
> But if you find invalid entries while backtracking the IPs, you're
> probably in parts of the X-Forwarded-For value that you shouldn't trust.

> (X-Forwarded-For can easily be spoofed by clients)

We only check that each entry is a valid IPv4 or IPv6 address and not just a random string. If we find invalid entries, then the only option we have is to continue the search further for a valid IP.


>  list_of_ips = append client_ip to X-Forwarded-For
>   for ip in reverse( list_of_ips )
>   do
>       if ip is not a trusted proxy or the iterator does not have more elements
>       then
>           return ip
>       fi
>   done

> (That should give invalid IP addresses for some requests. That
> seems to be the correct thing from my point of view. But if you rather
> geolocate wrong than not geolocate at all, you can choose to pick the
> last known good IP address instead. I guess that's more a matter of
> taste.)

What would picking the last known good IP address in this context? An IP that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy?


> Different stakeholders at the WMF use different lists of what kind of
> proxies they consider. So things are not really clear cut.

Compiling a common list - may be union of "General", "MediaWiki" and "Wikipedia Zero" - would be very useful. As Oliver suggests, it would be good to have them in site-wide config files.

Thanks,
Ananth