On 9/14/07, Tim Starling <tstarling(a)wikimedia.org> wrote:
I wouldn't recommend using a hashed IP address to
anyone involved in
academic work. I've worked in the academic sector, I know how important
it is for data to be above any criticism. Any data using unique IP
addresses as an estimate of individual user population would be severely
skewed by proxies and NAT.
Perhaps in order to prevent potentially violating our own privacy
policy, we can meet the researchers half-way. If we can find out the
reason they need IP addresses we can craft the data we send them to
satisfy their request. For example:
a) they could just need the unique addresses to link together browsing
patterns, but not care for them to be IP addresses. We could create
convert the addresses into a unique number (or a salted hash) and send
them the data.
b) they could be looking for network topology information; we could
give them the first two or three octets of the IP address.
c) they could be looking for geographical distribution of queries; we
could do the geo-lookup of addresses and give them coordinate
resolution for each address instead of the address itself.
Obviously, a b and c are all somewhat contentious still, but probably
less so than just giving them raw IP addresses, and could be a good
compromise.
-ilya