On 9/15/07, Ben McIlwain <cydeweys(a)gmail.com> wrote:
[snip]
The AOL search data was
NOT tagged with pseudonymous data (by which I'm assuming you mean
usernames). It was tagged with random numbers. The way privacy was
compromised in the AOL search data scandal had nothing to do with what
the data was labeled as and everything to do with what the data was.
One could look at all of the searches made by a given person and clue in
on who they were - e.g. by looking for local subjects in their searches,
see if they searched for anyone by name (maybe themselves or people they
knew), see if they searched for any esoteric subjects, etc.
A unique random ID is a pseudonym. The ability to tie multiple
searches to the same pseudonym was key, ... while I could guess the
probably identity of a single search in some cases without any
pseudonym it is, as you pointed out, the ability to tie them togeather
which creates trouble.
The point Tim was making was that the data Wikimedia has *previously
released* did not include any sort of identifyer, pseudonominous or
not, and thus doesn't have the same risks.
The data which is *proposed* to be disclosed would include IPs, which
acts as either a pseudonominous identifyer or an outright identifyer.
I doubt Tim would disagree that there are significant privacy
implications in the case of those. Which is, of course, why he said
they were willing to enter into a NDA.