[Foundation-l] Release of squid log data

Gwern Branwen gwern0 at gmail.com
Sun Sep 16 21:03:33 UTC 2007


On 2007.09.15 01:38:00 -0400, Gregory Maxwell <gmaxwell at gmail.com> scribbled 11 lines:
> On 9/15/07, Gwern Branwen <gwern0 at gmail.com> wrote:
> > In a very strong sense, we can 'safely' make no data available.
>
> This is a counter-productive over-statement. It is only true in the
> same sort of useless sense that many dramatic maxims are true in...

Dramatic maxims are useful for shock value, which is what is needed here since people seem to be thinking that we can release vast amounts of data and not worry about abuses at all. This attitude shocks me a little, since almost by definition this subject involves releasing even more data than usual, and we've already seen abuses of public data. Not to mention that you *can't* trust researchers to keep it confidential, any more than you could anyone else. (Remember the AOL thing? It was one of their researchers who released it.)

Every bit of data reduces privacy and anonymity; this is a fact of life akin to one-time pads being unbreakable, or lossless compression being unable to compress some strings, or collisions for hashes shorter than the input...

> I would not characterize it as such had you made any effort to
> concretely connect the background material, interesting as it will be
> to those who haven't seen it, to some aspect of our actual situation.

I assume everyone here is intelligent and doesn't need to have things spelled out in excruciating detail. For example, when I cite a specific Freedom House report, I assume I don't need to link the specific PDF - everyone here knows how to use Google because they've successfully subscribed to this list and are reading it.

When I cite a research paper showing that database inference attacks are powerful enough to defeat pseudonymizing and many other schemes, I don't think I should need to specifically say something rude and blunt; perhaps along the lines of "Oh, and everyone on the list who has suggested that we could just pseudonymize everything or only release parts of IP address - they're all incredibly naive fools with no appreciation for just how hard security is and how much information could be extracted from deceptively little data, and they really should just shut up and go read _Applied Cryptography_, or a bunch of Cryptogram backissues* and never again pontificate on security issues involving real people until they do."

The question here is not whether we can mangle the data so there is no danger of privacy violations. It exists, it will always exist. The question is, can we reduce that danger to below the average every-day risks of using the Internet such that our users won't have any reason to say that our privacy policy is a pack of lies and that the WMF has stabbed them in the back.

Right now, I'm not convinced it's worth it. Has anyone even said what the researchers want it for?

--
gwern
SecDef AKR FLAME GEODSS on Blackmednet EODN keebler mines ^X

*or anything, really! I happen to like Bruce Schneier's writings, but there's a lot of security literature that would make the same point.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/foundation-l/attachments/20070916/ad089d31/attachment.pgp 


More information about the foundation-l mailing list