[Foundation-l] Release of squid log data

Anthony wikimail at inbox.org
Sat Sep 15 02:34:21 UTC 2007


On 9/14/07, Ilya Haykinson <haykinson at gmail.com> wrote:
> On 9/14/07, Tim Starling <tstarling at wikimedia.org> wrote:
> > I wouldn't recommend using a hashed IP address to anyone involved in
> > academic work. I've worked in the academic sector, I know how important
> > it is for data to be above any criticism. Any data using unique IP
> > addresses as an estimate of individual user population would be severely
> > skewed by proxies and NAT.
>
> Perhaps in order to prevent potentially violating our own privacy
> policy, we can meet the researchers half-way.

The best way to avoid violating the privacy policy would be to change
it to say exactly what it is you plan on doing, and to not give data
from before the policy is changed.

> If we can find out the
> reason they need IP addresses we can craft the data we send them to
> satisfy their request.  For example:
>
> a) they could just need the unique addresses to link together browsing
> patterns, but not care for them to be IP addresses.  We could create
> convert the addresses into a unique number (or a salted hash) and send
> them the data.
>
In case anyone's seriously considering this, make sure you've read
[[AOL search data scandal]] which should show you why it's completely
useless.  This is *especially* true with Wikipedia data, where the
urls we access constantly reveal who we are (e.g.
http://en.wikipedia.org/wiki/User_talk:Whatever).

> b) they could be looking for network topology information; we could
> give them the first two or three octets of the IP address.
>
Three octects would be almost as bad as a) for the same reasons.  Two
octets would be better, but less useful too.

> c) they could be looking for geographical distribution of queries; we
> could do the geo-lookup of addresses and give them coordinate
> resolution for each address instead of the address itself.
>
If that geo information is limited to country, I guess it wouldn't be too bad.



More information about the foundation-l mailing list