[Foundation-l] Release of squid log data

Ilya Haykinson haykinson at gmail.com
Sat Sep 15 01:09:57 UTC 2007


On 9/14/07, Tim Starling <tstarling at wikimedia.org> wrote:
> I wouldn't recommend using a hashed IP address to anyone involved in
> academic work. I've worked in the academic sector, I know how important
> it is for data to be above any criticism. Any data using unique IP
> addresses as an estimate of individual user population would be severely
> skewed by proxies and NAT.

Perhaps in order to prevent potentially violating our own privacy
policy, we can meet the researchers half-way.  If we can find out the
reason they need IP addresses we can craft the data we send them to
satisfy their request.  For example:

a) they could just need the unique addresses to link together browsing
patterns, but not care for them to be IP addresses.  We could create
convert the addresses into a unique number (or a salted hash) and send
them the data.

b) they could be looking for network topology information; we could
give them the first two or three octets of the IP address.

c) they could be looking for geographical distribution of queries; we
could do the geo-lookup of addresses and give them coordinate
resolution for each address instead of the address itself.

Obviously, a b and c are all somewhat contentious still, but probably
less so than just giving them raw IP addresses, and could be a good
compromise.

-ilya



More information about the foundation-l mailing list