[Foundation-l] Release of squid log data

Samuel Klein sj at laptop.org
Sat Sep 15 18:33:55 UTC 2007


On Sat, 15 Sep 2007, Sue Gardner wrote:

> Yes. The first question is, would providing this data violate the
> privacy policy, which protects "private information" - often but not
> always assumed to mean personally-identifiable information. If we
> consider the squid log data to include potentially
> personally-identifiable/private information, then we can't release it
> to a third party. Regardless of how much we trust them, or what they
> are willing to sign.

Agreed.

> If the release does NOT violate the privacy policy, then the question
> becomes whether it violates existing community standards & practices.
> I don't know the answer to that. But there has been lots of discussion
> here, which may suggest there's not a clear consensus view.
> 
> IMO we want to help academics and we share lots of their values.. but
> it is more important that we protect our own community of
> users/contributors. So we want to err on that side.

In particular, Greg raises the interesting point that our community includes 
many eager researchers and statisticians.  For those who don't know Brian 
Mingus (who said earlier in the thread 'I suggest that it never be released, 
and that the foundation hire and/or appoint a statistician for analyzing logs 
in-house.'), he is a Wikipedian, datamining student, and frood who for a time 
ran the best real-time Wikipedia stats on the web.  Something he stopped doing 
among other reasons because it was so difficult for him to get reliable data 
from the source.

I'll also note that Erik Zachte's stats haven't been effectively run on
the largest wikis for some time.

So when we do make stats available, I'd like to see us err on the side of 
giving our community of amazingly talented users/contributors access to them, 
before giving them to a university that asks formally on electronic letterhead.


> Erik Moeller:
>> I might support a research exemption clause in future versions of the
>> policy _if_ a compelling case can be made that such an exemption is
>> needed, and that no alternative research method would produce results
>> of approximately the same quality. So far no such case has been made.
> 
>> Whatever we do, it is crucial that we make it clear to our users
>> through our privacy policy what is going on. In that spirit, I would
>> also appreciate it if the privacy policy could be updated to describe
>> the existing agreements with universities, and the work that is being
>> done on the toolserver.

Yes, please.  There are active Wikipedians with experience in these fields who 
do not subscribe to foundation-l, and they should at least know what
is going on at present.

Is there a good overview online of the work done and processes run on the 
[various] toolserver[s]?

SJ



More information about the foundation-l mailing list