[Foundation-l] Release of squid log data

Gregory Maxwell gmaxwell at gmail.com
Sat Sep 15 17:18:35 UTC 2007


On 9/15/07, Ben McIlwain <cydeweys at gmail.com> wrote:
[snip]
> The AOL search data was
> NOT tagged with pseudonymous data (by which I'm assuming you mean
> usernames).  It was tagged with random numbers.  The way privacy was
> compromised in the AOL search data scandal had nothing to do with what
> the data was labeled as and everything to do with what the data was.
> One could look at all of the searches made by a given person and clue in
> on who they were - e.g. by looking for local subjects in their searches,
> see if they searched for anyone by name (maybe themselves or people they
> knew), see if they searched for any esoteric subjects, etc.

A unique random ID is a pseudonym.  The ability to tie multiple
searches to the same pseudonym was key, ... while I could guess the
probably identity of a single search in some cases without any
pseudonym it is, as you pointed out, the ability to tie them togeather
which creates trouble.

The point Tim was making was that the data Wikimedia has *previously
released* did not include any sort of identifyer, pseudonominous or
not, and thus doesn't have the same risks.

The data which is *proposed* to be disclosed would include IPs, which
acts as either a pseudonominous identifyer or an outright identifyer.
I doubt Tim would disagree that there are significant privacy
implications in the case of those. Which is, of course, why he said
they were willing to enter into a NDA.



More information about the foundation-l mailing list