Re: [Foundation-l] Release of squid log data

16 Sep 2007

On 2007.09.15 01:38:00 -0400, Gregory Maxwell &lt;gmaxwell(a)gmail.com&gt; scribbled 11
lines:
...
  On 9/15/07, Gwern Branwen &lt;gwern0(a)gmail.com&gt;
wrote:
  In a very strong sense, we can 'safely'
make no data available. 
 This is a counter-productive over-statement. It is only true in the
 same sort of useless sense that many dramatic maxims are true in... 
Dramatic maxims are useful for shock value, which is what is needed here since people seem
to be thinking that we can release vast amounts of data and not worry about abuses at all.
This attitude shocks me a little, since almost by definition this subject involves
releasing even more data than usual, and we've already seen abuses of public data. Not
to mention that you *can't* trust researchers to keep it confidential, any more than
you could anyone else. (Remember the AOL thing? It was one of their researchers who
released it.)

Every bit of data reduces privacy and anonymity; this is a fact of life akin to one-time
pads being unbreakable, or lossless compression being unable to compress some strings, or
collisions for hashes shorter than the input...

...
  I would not characterize it as such had you made any
effort to
 concretely connect the background material, interesting as it will be
 to those who haven't seen it, to some aspect of our actual situation. 
I assume everyone here is intelligent and doesn't need to have things spelled out in
excruciating detail. For example, when I cite a specific Freedom House report, I assume I
don't need to link the specific PDF - everyone here knows how to use Google because
they've successfully subscribed to this list and are reading it.

When I cite a research paper showing that database inference attacks are powerful enough
to defeat pseudonymizing and many other schemes, I don't think I should need to
specifically say something rude and blunt; perhaps along the lines of "Oh, and
everyone on the list who has suggested that we could just pseudonymize everything or only
release parts of IP address - they're all incredibly naive fools with no appreciation
for just how hard security is and how much information could be extracted from deceptively
little data, and they really should just shut up and go read _Applied Cryptography_, or a
bunch of Cryptogram backissues* and never again pontificate on security issues involving
real people until they do."

The question here is not whether we can mangle the data so there is no danger of privacy
violations. It exists, it will always exist. The question is, can we reduce that danger to
below the average every-day risks of using the Internet such that our users won't have
any reason to say that our privacy policy is a pack of lies and that the WMF has stabbed
them in the back.

Right now, I'm not convinced it's worth it. Has anyone even said what the
researchers want it for?

--
gwern
SecDef AKR FLAME GEODSS on Blackmednet EODN keebler mines ^X

*or anything, really! I happen to like Bruce Schneier's writings, but there's a
lot of security literature that would make the same point.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] Release of squid log data