[Foundation-l] Release of squid log data

Kat Walsh mindspillage at gmail.com
Fri Sep 14 18:39:12 UTC 2007


On 9/14/07, Tim Starling <tstarling at wikimedia.org> wrote:
> For a while now, we've been releasing squid log data, stripped of
> personally identifying information such as IP addresses, to groups at
> two universities: Vrije Universiteit and the University of Minnesota. We
> now have a request pending from a third group, at Universidad Rey Juan
> Carlos in Spain. They are asking if they can have the full data stream
> including IP addresses, and they are prepared to sign a confidentiality
> agreement to get it.
>
> I'm leaning towards letting them have it. Via the confidentiality
> agreement, we can avoid the most likely abuse scenarios, such as release
> of individual user profiles. Currently we let toolserver users process
> similar data, assisted by Wikipedia administrators who put web bugs on
> the site. They use it to produce the WikiCharts report. Are we to tell
> prospective research groups to use the toolserver, rather than their own
> substantial hardware, for analysis of Wikipedia traffic patterns?
>
> I'm not sure if this would be allowed on the privacy policy, which does
> mention statistics, but doesn't say who is making them. Maybe the use of
> web bugs by administrators is already against the privacy policy. In any
> case, I think the question would benefit from community discussion,
> which is why I am posting it here.
>
> -- Tim Starling

I don't know if we should be letting any outside groups have the IP
addresses/data we are supposed to keep private; I'm uncomfortable with
that. I'd sooner we have someone here who is already trusted take
requests to run queries. (I note that Greg volunteers to do this...
and, for that matter, has been asking for access to do just such
things in the past.)

I don't think relying on an NDA to keep things private is effective
enough to meet our obligations. If we don't trust people to use proper
research ethics we shouldn't give them access to anything important in
the first place. But mistakes happen, leaks happen, and that you can
show somewhere along the way someone signed something that said they
wouldn't disclose private data doesn't take back the damage done from
mishandling.

The rest of the log data, that isn't private -- I don't see why you
should need to be a university group to access it. Is there somewhere
to do so publicly, or at least where anyone may make a request?

-Kat

-- 
Wikimedia needs you: http://wikimediafoundation.org/wiki/Fundraising
* *  * *  * *  * *  * *  * *  * *  * *  * *  * *  * *  * *  * *  * *
http://en.wikipedia.org/wiki/User:Mindspillage | (G)AIM:Mindspillage
mindspillage or mind|wandering on irc.freenode.net | email for phone



More information about the foundation-l mailing list