(This mail is in response to a thread on Commons-l; I'm cross posting it to the toolserver list, because it seems relevant there. Ultimately, this should probably be discussed with the WMF and its local associates)
Please note that publishing "intelligence" obtained from analyzing person-related data may be considered a violation of that person's privacy, even if the analyzed data is publicly available - at least under German law and, afaik, EU guidelines. Data mining can expose things about a person that are not easily found out by looking straight at the raw data - this is often problematic, especially since the results can be quite misleading, as per the nature of the methods used.
If this is stupid or not is besides the point. We have been asked explicitly by the German Wikimedia e.V. not do make any analysis of user data available on the toolserver, so we won't (although i find it a bit hard to draw a line). If it would be legal for the US based foundation to do it, is a different question. A different question still is if it would be wise and desirable. To quote Wau Holland and the "hacker ethics" of the Chaos Computer Club: "utilize public data, protect private data". Information wants to be free - but so do people. In the end, the latter are more important.
I personally feel that any analysis that exposes information that is not *relevant* to activity on the project should be strictly opt-in. An example would be a breakup of user activity be time of days or day of the week. I'm not sure about things like the number of untagged images a user has uploaded, for example - that does seem relevant to me. If it's legal or wise to expose such an analysis is an open question to me (actually, I'd like to have some input on this, since my tools can give that statistic).
On the other hand, I believe that admins should expect to be subject to "public oversight". This is in my opinion an important part of an informal "watch the powerful" mechanism. We already have the ability to see a list of "admin action" an admin has performed. I'm a bit unsure about consolidating that log data into a statistics of deletions per week or whatever - I think we should ask ourselves how useful that would really be. In any case it should be made more obvious to people what "data trails" they leave when working on Wikimedia projects, as a "normal" user and as an admin.
General statistics about admin activity - i.e. sum of all admins, not per person, would be quite interesting, though, and unproblematic.
Regards, Daniel
Okay, do individuals' deletion totals need to be made opt-in then?
If I have a question about whether specific feature is okay, who is the best person to ask?
Does a user's total number of edits count as "intelligence that is not relevant to Wikipedia activity"?
-Dave
On Thu, Aug 10, 2006 at 01:11:44PM +0200, Daniel Kinzler wrote:
(This mail is in response to a thread on Commons-l; I'm cross posting it to the toolserver list, because it seems relevant there. Ultimately, this should probably be discussed with the WMF and its local associates)
Please note that publishing "intelligence" obtained from analyzing person-related data may be considered a violation of that person's privacy, even if the analyzed data is publicly available - at least under German law and, afaik, EU guidelines. Data mining can expose things about a person that are not easily found out by looking straight at the raw data - this is often problematic, especially since the results can be quite misleading, as per the nature of the methods used.
If this is stupid or not is besides the point. We have been asked explicitly by the German Wikimedia e.V. not do make any analysis of user data available on the toolserver, so we won't (although i find it a bit hard to draw a line). If it would be legal for the US based foundation to do it, is a different question. A different question still is if it would be wise and desirable. To quote Wau Holland and the "hacker ethics" of the Chaos Computer Club: "utilize public data, protect private data". Information wants to be free - but so do people. In the end, the latter are more important.
I personally feel that any analysis that exposes information that is not *relevant* to activity on the project should be strictly opt-in. An example would be a breakup of user activity be time of days or day of the week. I'm not sure about things like the number of untagged images a user has uploaded, for example - that does seem relevant to me. If it's legal or wise to expose such an analysis is an open question to me (actually, I'd like to have some input on this, since my tools can give that statistic).
On the other hand, I believe that admins should expect to be subject to "public oversight". This is in my opinion an important part of an informal "watch the powerful" mechanism. We already have the ability to see a list of "admin action" an admin has performed. I'm a bit unsure about consolidating that log data into a statistics of deletions per week or whatever - I think we should ask ourselves how useful that would really be. In any case it should be made more obvious to people what "data trails" they leave when working on Wikimedia projects, as a "normal" user and as an admin.
General statistics about admin activity - i.e. sum of all admins, not per person, would be quite interesting, though, and unproblematic.
Regards, Daniel
-- Homepage: http://brightbyte.de _______________________________________________ Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
interiot wrote: Okay, do individuals' deletion totals need to be made opt-in then?
Not sure - I would have thought that is uncritical, but Paddy's comments on Commons-l imply the opposite.
If I have a question about whether specific feature is okay, who is the best person to ask?
I think it would be a very good idea to ask the Foundation to develop a privacy guideline for MediaWiki and tool developers.
Does a user's total number of edits count as "intelligence that is not relevant to Wikipedia activity"?
Since several wikis have policies requiring a minimum number of edits to the main namespace, etc, it *is* relevant. OTOH, I'm a bit undecided if it's really OK to expose this information without asking.
Ideally, new users would be presented with a text explaining what data they expose when they contribute, and how this may be analyzed. This may even be made a click-through part of the account creation process.
Also ideally, users would be able to state which types of analysis they want to allow. Perhaps a simple scheme of "no statistics", "simple statistics" and "detailed statistics" would be enough, with a default of "simple". "Simple" statistics would basically be total counts (number of edits, maybe per namespace, number of deletions, etc), "detailed" would be "anything goes". A per-project policy could then require admins to at least allow simple statistics, or something like that.
I'm not sure if this would be feasible and/or legally sound. I'm just brainstorming here.
-- Daniel
PS: sorry for cross-posting again. I guess this discussion should be moved to the foundation list or something.
If I have a question about whether specific feature is okay, who is the best person to ask?
I think it would be a very good idea to ask the Foundation to develop a privacy guideline for MediaWiki and tool developers.
I was under the impression that the edit-count opt-in was required due to possible legal liability to German Wikimedia, not because of Foundation issues.
I'm not sure if this would be feasible and/or legally sound. I'm just brainstorming here.
It would be good to have better guidelines from German Wikimedia... toolserver authors can write tools and wait for German Wikimedia to request that they be turned off or moved to other servers, but tool developers would spend less time writing wasted code if they knew the german toolserver policy better beforehand.
Another question: If the toolserver had the ability to confirm that a specific user is an admin or a checkuser, would it be okay with German WIkimedia if tools would provide all information to admins or checkusers? (eg. even when other users haven't opted-in yet?)
-Interiot
interiot@68k.org wrote:
I was under the impression that the edit-count opt-in was required due to possible legal liability to German Wikimedia, not because of Foundation issues.
Foremost, it's a request by the German Wikimedia e.V., yes. I do think that the foundation should have a clear position on this however, independent of local legislation. After all, it also concerns possible future features of MediaWiki.
It would be good to have better guidelines from German Wikimedia... toolserver authors can write tools and wait for German Wikimedia to request that they be turned off or moved to other servers, but tool developers would spend less time writing wasted code if they knew the german toolserver policy better beforehand.
Indeed. I guess no one really thought about it until now.
Another question: If the toolserver had the ability to confirm that a specific user is an admin or a checkuser, would it be okay with German WIkimedia if tools would provide all information to admins or checkusers? (eg. even when other users haven't opted-in yet?)
In some cases, I would say yes. But it has to be decided on a case by case basis, IMHO. Saying "admins can see anything" wouldn't do.
-- Daniel
Another question: If the toolserver had the ability to confirm that a specific user is an admin or a checkuser, would it be okay with German WIkimedia if tools would provide all information to admins or checkusers? (eg. even when other users haven't opted-in yet?)
In some cases, I would say yes. But it has to be decided on a case by case basis, IMHO. Saying "admins can see anything" wouldn't do.
I don't think it's consistent to say that german toolserver developers can run queries that admins can't, when some of us aren't admins, and most of us aren't checkusers.
Do toolserver developers need to start being careful about the private queries they run? Even though most of the data is available from download.wikimedia.org? (I guess the archive table is only partially available to people who have downloaded previous versions of download.wikimedia.org data)
-Interiot
I don't think it's consistent to say that german toolserver developers can run queries that admins can't, when some of us aren't admins, and most of us aren't checkusers.
The question is what data is *published*. But yes, perhaps there should be more community control over who gets access to the toolserver. While there's no highly sensitive data visible, the possibilities of mining the database could still be abused.
Do toolserver developers need to start being careful about the private queries they run?
You always have that responsibility if you are going to publish the results.
Even though most of the data is available from download.wikimedia.org?
if the raw data is public is not really relevant, afaik legally and imho morally.
-- Daniel
toolserver-l@lists.wikimedia.org