On 9/14/07, Tim Starling <tstarling(a)wikimedia.org> wrote:
For a while now, we've been releasing squid log
data, stripped of
personally identifying information such as IP addresses, to groups at
two universities: Vrije Universiteit and the University of Minnesota. We
now have a request pending from a third group, at Universidad Rey Juan
Carlos in Spain. They are asking if they can have the full data stream
including IP addresses, and they are prepared to sign a confidentiality
agreement to get it.
I'm leaning towards letting them have it. Via the confidentiality
agreement, we can avoid the most likely abuse scenarios, such as release
of individual user profiles. Currently we let toolserver users process
similar data, assisted by Wikipedia administrators who put web bugs on
the site. They use it to produce the WikiCharts report. Are we to tell
prospective research groups to use the toolserver, rather than their own
substantial hardware, for analysis of Wikipedia traffic patterns?
I'm not sure if this would be allowed on the privacy policy, which does
mention statistics, but doesn't say who is making them. Maybe the use of
web bugs by administrators is already against the privacy policy. In any
case, I think the question would benefit from community discussion,
which is why I am posting it here.
-- Tim Starling
I don't know if we should be letting any outside groups have the IP
addresses/data we are supposed to keep private; I'm uncomfortable with
that. I'd sooner we have someone here who is already trusted take
requests to run queries. (I note that Greg volunteers to do this...
and, for that matter, has been asking for access to do just such
things in the past.)
I don't think relying on an NDA to keep things private is effective
enough to meet our obligations. If we don't trust people to use proper
research ethics we shouldn't give them access to anything important in
the first place. But mistakes happen, leaks happen, and that you can
show somewhere along the way someone signed something that said they
wouldn't disclose private data doesn't take back the damage done from
mishandling.
The rest of the log data, that isn't private -- I don't see why you
should need to be a university group to access it. Is there somewhere
to do so publicly, or at least where anyone may make a request?
-Kat
--
Wikimedia needs you:
http://wikimediafoundation.org/wiki/Fundraising
* * * * * * * * * * * * * * * * * * * * * * * * * * * *
http://en.wikipedia.org/wiki/User:Mindspillage | (G)AIM:Mindspillage
mindspillage or mind|wandering on
irc.freenode.net | email for phone