On 9/16/07, Gwern Branwen <gwern0(a)gmail.com> wrote:
On 2007.09.15 01:38:00 -0400, Gregory Maxwell
<gmaxwell(a)gmail.com> scribbled 11 lines:
On 9/15/07, Gwern Branwen
<gwern0(a)gmail.com> wrote:
In a very strong sense, we can 'safely'
make no data available.
This is a counter-productive over-statement. It is only true in the
same sort of useless sense that many dramatic maxims are true in...
Dramatic maxims are useful for shock value, which is what is needed here
We probably have an unresolvable difference in value.
In my view decision making processes need 'shock value' as much
hen-houses need foxes. ...
since people seem to be thinking that we can release
vast amounts of data
and not worry about abuses at all. This attitude shocks me a little,
since almost by definition this subject involves releasing even more
data than usual, and we've already seen abuses of public data.
At the beginning of the thread the initial respondents appeared to be
under the mistaken impression that we were already liberally releasing
effectively identical information.
In later replies the tone has been more negative.. to the point where
I'm concerned that we may at risk of discarding the baby with the
bathwater.
Not to mention that you *can't* trust researchers
to keep it
confidential, any more than you could anyone else.
Well, more than "anyone else" perhaps. Certainly it would be better to
give the data to 'researchers' than a malicious force, or to someone
completely unqualified to handle private data. ... But at the same
time it would be better still to minimize disclosure.
Every bit of data reduces privacy and anonymity; this
is a fact of life
Technically true, but not useful.
I assume everyone here is intelligent and
Then why resort to shock statements and over-generalizations?
[snip]
The question here is not whether we can mangle the
data so there is no
danger of privacy violations. It exists, it will always exist. The
question is, can we reduce that danger to below the average every-day
risks
[snip]
Right now, I'm not convinced it's worth it.
[snip]
I think you are creating a false choice here: The choice when dealing
with private data isn't only between "no release at all" and
"substantial risk but below the average every day risk".
Even while keeping the pedantic "Every bit of data reduces privacy and
anonymity" in mind, there are many types of data extract which pose an
exposure level so low that we can fairly classify it as none when
speaking English rather than pedantese:
For example, no one sane is going to claim that releasing the daily
viewership rates for existent articles with some quantization is going
to cause an measurable impact to anyone's privacy or anonymity.