[Foundation-l] New draft of privacy policy

List overview All Threads
Download

newer

older

[Foundation-l] Fwd: [Good idea] -...

[Foundation-l] Monthly ED Report...

Florence Devouard

14 Jun 2008 14 Jun '08

8:55 a.m.

Hello participants ! "term used on purpose"... Mike has drafted a new version of the privacy policy. Given that this policy is one of the nearest thing to define terms of agreements between WMF and editors, I invite you to not only read it carefully, but please also inform your community members on the relevant village pump. Your input is welcome. Please note that voting on this policy is planned next week-end during the 21st of June board meeting. So, input is welcome NOW. Thank you Anthere THE PAGE: http://meta.wikimedia.org/wiki/Draft_Privacy_Policy_June_2008

Show replies by date

Jesse Plamondon-Willard

14 Jun 14 Jun

11:03 p.m.

On Sat, Jun 14, 2008 at 4:55 AM, Florence Devouard <Anthere9(a)yahoo.com> wrote:

...

Your input is welcome. Please note that voting on this policy is planned next week-end during the 21st of June board meeting. So, input is welcome NOW.

While the draft is very good as a supporting explanatory essay, I don't think it's written as a policy; it's unnecessarily verbose, reads like an essay or opinion piece, makes incorrect assumptions (like "everyone can contribute", "history [...] is preserved indefinitely", or "you are encouraged but not required to register with your real name" (some wikis specifically discourage that due to stalking, etc)), significantly addresses non-privacy subjects (like community values, copyright, or user access hierarchy), and uses redundant section numbering (sections are numbered automatically in the table of contents). I think the explanatory material should be moved to a separate essay, so that the policy only contains policy. I've drafted a rewritten policy that addresses these and other concerns (such as undue references to en-Wikipedia) at <http://meta.wikimedia.org/wiki/Talk:Draft_Privacy_Policy_June_2008#Rewrite>. I'd also appreciate input on that rewritten draft. -- Yours cordially, Jesse Plamondon-Willard (Pathoschild)

Florence Devouard

15 Jun 15 Jun

11:10 a.m.

Jesse Plamondon-Willard wrote:

...

On Sat, Jun 14, 2008 at 4:55 AM, Florence Devouard <Anthere9(a)yahoo.com> wrote:

Your input is welcome. Please note that voting on this policy is planned next week-end during the 21st of June board meeting. So, input is welcome NOW.

Hello Pathoschild, I've dropped input on the rewritten draft. My main concern with it is that it is rewritten in such a way that * it only addresses privacy issue on the projects themselves (rather than on all activities related to the projects, eg, mailing lists, OTRS). * it totally neglects issues related to special access users (in particular checkusers etc...) * it also removes some new decisions recently made by the board (eg, notification of a user when private data has been released upon legal request) I agree that the original document is a bit verbiose and could be simplified in some parts. I also agree that part of it is "descriptive" rather than "policy". However, "simplification" should keep all the meat. I wonder if it would not be possible to separate this document in two documents. * One describing the philosophy and the data kept. * The other being more policy oriented. OR Separating more clearly in the document, points related to "projects" and points related to other activities (mailing lists, irc, otrs etc...) Ant

Jesse Plamondon-Willard

16 Jun 16 Jun

3:53 a.m.

Florence Devouard <Anthere9(a)yahoo.com> wrote:

...

* it also removes some new decisions recently made by the board (eg, notification of a user when private data has been released upon legal request)

The text about notification is present under ==Access to and publication of information==: "In the event of such a legally compulsory request, the Foundation will attempt to notify the affected user [...]." I think the other points are already fixed in the draft FT2 and I worked on (see the talk page). -- Yours cordially, Jesse Plamondon-Willard (Pathoschild)

Ray Saintonge

14 Jun 14 Jun

11:34 p.m.

Florence Devouard wrote:

...

This proposal looks more like an essay full of excess verbiage. There is much in there that has absolutely nothing to do with privacy; it may be valid policy, but it belongs in a different document. The detailed explanatory portions of the document should not be treated as policy. They can be useful, but where they conflict with the actual policy the policy itself should prevail. In my view any policy document should be succinct and to the point. Ec

Anthony

15 Jun 15 Jun

11:40 a.m.

Someone should answer Gregory's question first: "Why do we grant the equivalent of checkuser rights over a majority of our contributors to every person on the planet?" "Historical accident" was the only thing I could come up with.

geni

2:15 p.m.

2008/6/15 Anthony <wikimail(a)inbox.org>rg>:

...

It's hard not to. If we were to say assign a random number to every IP then by now someone would have published a partial list of number to IP relationships. If the number assigns keep changing well we know the problems that we had with AOL back in the day. -- geni

Thomas Dalton

2:27 p.m.

2008/6/15 geni <geniice(a)gmail.com>om>:

...

2008/6/15 Anthony <wikimail(a)inbox.org>rg>:

You can do it with a hash. Each hash could be mapped to from multiple IP addresses, so it's impossible to work out the IP address from the hash. Of course, you then have the risk of collisions, but that can be kept fairly small, and isn't the end of the world - we get collisions anyway when multiple people use one IP address. That said, I don't have a problem with publishing IP addresses of anon users - it's made clear to them that that will happen, and they have the option of registering if they have want to keep it hidden. The risk from having your IP address publicly known is really pretty minimal (mine is 82.152.59.121 (or 122 if you want my actual computer, rather the router, but the router is what's reported to the outside world) - do with it what you will!!).

David Gerard

2:31 p.m.

2008/6/15 Thomas Dalton <thomas.dalton(a)gmail.com>om>:

...

2008/6/15 geni <geniice(a)gmail.com>om>: > 2008/6/15 Anthony <wikimail(a)inbox.org>rg>:

...

>> Someone should answer Gregory's question first: "Why do we grant the >> equivalent of checkuser rights over a majority of our contributors to >> every person on the planet?" >> "Historical accident" was the only thing I could come up with.

...

> It's hard not to. If we were to say assign a random number to every IP > then by now someone would have published a partial list of number to > IP relationships. If the number assigns keep changing well we know the > problems that we had with AOL back in the day.

...

It's also entirely unclear how this proposal would actually cause a better encyclopedia, dictionary, media archive, quote database etc. to be written. You know, the stuff we're supposed to be here for. Project first, then community. - d.

Gregory Maxwell

10:23 p.m.

On Sun, Jun 15, 2008 at 10:31 AM, David Gerard <dgerard(a)gmail.com> wrote:

...

By this logic we should grant access to Special:Checkuser to everyone. No? Explain. :)

David Gerard

10:55 p.m.

2008/6/15 Gregory Maxwell <gmaxwell(a)gmail.com>om>:

...

On Sun, Jun 15, 2008 at 10:31 AM, David Gerard <dgerard(a)gmail.com> wrote:

...

> It's also entirely unclear how this proposal would actually cause a > better encyclopedia, dictionary, media archive, quote database etc. to > be written. You know, the stuff we're supposed to be here for. Project > first, then community.

...

By this logic we should grant access to Special:Checkuser to everyone. No? Explain.

You originally claimed something was in need of fixing; support it. - d.

Gregory Maxwell

11:20 p.m.

On Sun, Jun 15, 2008 at 6:55 PM, David Gerard <dgerard(a)gmail.com> wrote:

...

2008/6/15 Gregory Maxwell <gmaxwell(a)gmail.com>om>:

On Sun, Jun 15, 2008 at 10:31 AM, David Gerard <dgerard(a)gmail.com> wrote: > It's also entirely unclear how this proposal would actually cause a > better encyclopedia, dictionary, media archive, quote database etc. to > be written. You know, the stuff we're supposed to be here for. Project > first, then community.

By this logic we should grant access to Special:Checkuser to everyone. No? Explain.

You originally claimed something was in need of fixing; support it.

I only asked why we give the equivalent checkuser on half our users to the general public. So far only Anthony has provided a reasonable explanation. To make you happy I'll go ahead and make an argument for fixing something: I don't see any logical cause for the inconsistency in how we treat registered and unregistered users. There is no particular reason is has to be this way, it seems to be historical accident as Anthony suggested. Instead we could publish the IPs of all edits, we could use opaque identifiers for anons, or we could completely dissallow anonymous editing. All of these would be consistent solutions. The current inconsistent situation generates a lot of problems: Careful COI pushers are rewarded for being smart enough to log in while at the same time normal users are harmed by accidentally getting logged out and having their IP surprisingly leaked. The edit histories of our articles are frequently sliced and diced to hide the IPs of established contributors and this sometimes makes the article history misleading. For example, see my edits on meta today (I swear I didn't do that intentionally to make a point, I have no clue how I ended up logged out) ... my IP edits couldn't be hidden without making the history misleading due to the timing of Cimon's edits. ... and the service of IP edit oversighting is generally only available to the Wiki(p|m)edia elite, if for no other reason than few others know it is available. Unregistered users account for roughly half of the contributors on at least one of the largest projects (EnWP). They make many valid and useful contributions (along with a bunch of junk...). We often mislead them about their privacy by calling their contributions "anonymous" when they are far less anonymous than the edits made by many registered users. Checkuser is by far one of the most highly regulated activities on all the projects. We keep a very tight fist over it. Yet, its equivalent is given freely over an enormous subset of the contributors. This smacks of favoritism. I think our behavior should probably be changed to remove the inconsistency. By removing the inconsistency we will prevent unpleasant surprises. I think the ability to *know* and *understand* the privacy posture you have when editing Wikipedia is more important than what the posture is, so I don't care which path to consistency is taken. I would presume that of the three I suggested most users would prefer replacing IPs with unique identifiers. The primary harm this path would cause is an increase in need for checkusers. If need-be the increased need for checkusers could be addressed by creating a lower class of checkusers who only have the ability to view the (previously public) information related to unregistered users. Such a solution would preserve an inconsistency but I believe it would be strictly more consistent than the current behavior.

John Vandenberg

16 Jun 16 Jun

5:21 a.m.

On Mon, Jun 16, 2008 at 9:20 AM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:

...

On Sun, Jun 15, 2008 at 6:55 PM, David Gerard <dgerard(a)gmail.com> wrote:

2008/6/15 Gregory Maxwell <gmaxwell(a)gmail.com>om>:

By this logic we should grant access to Special:Checkuser to everyone. No? Explain.

You originally claimed something was in need of fixing; support it.

I only asked why we give the equivalent checkuser on half our users to the general public. So far only Anthony has provided a reasonable explanation.

There is a much more obvious answer: nobody has written the code to do otherwise. An IP is a fixed size which helps with storage, and the properties of IP numbering and re-use are well-known, allowing people to roughly guess when it is a different person on the same IP. Any change to mediawiki to remove or obscure IPs needs to also give a similar ability back to editors; we are human and we like to know how many editors we are working with, even more so when editing behaviour is suspicious.

...

To make you happy I'll go ahead and make an argument for fixing something: I don't see any logical cause for the inconsistency in how we treat registered and unregistered users. There is no particular reason is has to be this way, it seems to be historical accident as Anthony suggested. Instead we could publish the IPs of all edits, we could use opaque identifiers for anons, or we could completely dissallow anonymous editing. All of these would be consistent solutions.

It is very strange that we call IP edits "anonymous" yet they are often more revealing than edits made when logged in.

...

The current inconsistent situation generates a lot of problems: Careful COI pushers are rewarded for being smart enough to log in while at the same time normal users are harmed by accidentally getting logged out and having their IP surprisingly leaked. The edit histories of our articles are frequently sliced and diced to hide the IPs of established contributors and this sometimes makes the article history misleading. For example, see my edits on meta today (I swear I didn't do that intentionally to make a point, I have no clue how I ended up logged out) ... my IP edits couldn't be hidden without making the history misleading due to the timing of Cimon's edits. ... and the service of IP edit oversighting is generally only available to the Wiki(p|m)edia elite, if for no other reason than few others know it is available.

The oversight tool desperately needs finer granularity. If the IP is the element that needs to be hidden, it shouldnt be necessary to pretend that the edit didnt happen. Anyone know when the new oversight tool is going to land? https://bugzilla.wikimedia.org/show_bug.cgi?id=3576 Also, many people are not aware that oversight needs to be done before the next dump in order to be useful. I often see admins removing six months old IP talk contribs, for privacy reasons, and are a bit surprised and annoyed when I show them the dumps.

...

Unregistered users account for roughly half of the contributors on at least one of the largest projects (EnWP). They make many valid and useful contributions (along with a bunch of junk...). We often mislead them about their privacy by calling their contributions "anonymous" when they are far less anonymous than the edits made by many registered users. Checkuser is by far one of the most highly regulated activities on all the projects. We keep a very tight fist over it. Yet, its equivalent is given freely over an enormous subset of the contributors. This smacks of favoritism. I think our behavior should probably be changed to remove the inconsistency. By removing the inconsistency we will prevent unpleasant surprises. I think the ability to *know* and *understand* the privacy posture you have when editing Wikipedia is more important than what the posture is, so I don't care which path to consistency is taken. I would presume that of the three I suggested most users would prefer replacing IPs with unique identifiers. The primary harm this path would cause is an increase in need for checkusers.

Rather than adding a layer on top of IP to hide the IP, it would be less revealing to automatically assign each new IP session with a cookie managed identifier, i.e. "Guest1234" (or a long random string that does not repeat, such as a GUID ) and then allow the user to rename this "guest account" when they finally learn how to. Also when a user has accidentally logged out, when they log back in from a guest account to their main account, the system could allow the user to merge those guest edit into their main account. -- John

Gregory Maxwell

7:21 a.m.

On Mon, Jun 16, 2008 at 1:21 AM, John Vandenberg <jayvdb(a)gmail.com> wrote:

...

It would be nearly trivial to feed the IP through a 32bit block cipher, convert that to base 36 (or just an integer), and use that as the user_text. I'm pretty confident that a reasonably clean solution wouldn't be hard. ::shrugs:: But does anyone anywhere want that behavior in mediawiki?

...

It is very strange that we call IP edits "anonymous" yet they are often more revealing than edits made when logged in.

Indeed.

...

note my comment at the bottom of that ticket. :)

...

Also, many people are not aware that oversight needs to be done before the next dump in order to be useful. I often see admins removing six months old IP talk contribs, for privacy reasons, and are a bit surprised and annoyed when I show them the dumps.

People are also surprised when deletion fails to successfully hide information. Considering how trivial it is to run a script that saves every change as it is made.. all we can really hope to do is minimize the bleeding.

...

It would be less revealing but it would greatly amplify the ability to hide because it would be far more anonymous. Depending on the implementation it could be used as a force multiplier with a single user on a single IP churning out dozens of guest ids by flushing their cookies. Obscuring the IP would convert the IPs into effective pseudonymous names, similar to real account names. The above would create something much closer to actual anonymous edits. I doubt most Wikimedia Wikis would support a proposal like that. (though, personally, I suspect life would go on if it were done).

Anthony

17 Jun 17 Jun

12:19 a.m.

On Mon, Jun 16, 2008 at 1:21 AM, John Vandenberg <jayvdb(a)gmail.com> wrote:

...

On Mon, Jun 16, 2008 at 9:20 AM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:

I only asked why we give the equivalent checkuser on half our users to the general public. So far only Anthony has provided a reasonable explanation.

That pretty much *is* my answer. Although, I think the main reason nobody has written the code is that no one has been provided with a spec which has any reasonable chance of being implemented. Change don't happen much 'round here.

...

Along those lines, I wonder how many people would be scared away from editing if the "edit" link took them to the account creation page. It'd be worth trying "as an experiment", if there were any mechanism to actually perform such experiments. On Mon, Jun 16, 2008 at 3:02 PM, Nathan <nawrich(a)gmail.com> wrote:

...

We could easily implement it so that administrators automatically see IPs or can convert in one step, meaning issues of tracking and dealing with vandalism wouldn't apply.

I'm pretty sure if that were done one of the admins would leak a full table of IP to hash matchups. It'd only take one admin to send such a table to Wikitruth or something and the whole hashing scheme would be fairly useless. Counterproductive, even, because it'd provide a false sense of security. On Mon, Jun 16, 2008 at 2:12 PM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:

...

Ah... Forget it. We won't come to an understanding. ... At least I tried.

I quote Florence when I say, "Nod."

Gerard Meijssen

16 Jun 16 Jun

5:06 p.m.

Hoi, This is becoming a nice game of silly buggers. I will bite anyway. One reason why we should not grant access to Special:Checkuser to everyone is because it would have all kinds of really nasty side effects. For one it would make the life of stalkers that much easier. Giving stalkers the use of this tool would effectively harm first the community and because of the implications it would then harm the project. Thanks, GerardM On Mon, Jun 16, 2008 at 12:23 AM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:

...

On Sun, Jun 15, 2008 at 10:31 AM, David Gerard <dgerard(a)gmail.com> wrote:

By this logic we should grant access to Special:Checkuser to everyone. No? Explain. :) _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Gregory Maxwell

5:27 p.m.

On Mon, Jun 16, 2008 at 1:06 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

Okay. So why do we give a checkuser equivalent (releasing the IPs of unregistered users) to the entire world for something around half our contributors?

Andrew Whitworth

5:48 p.m.

On Mon, Jun 16, 2008 at 1:27 PM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:

...

Okay. So why do we give a checkuser equivalent (releasing the IPs of unregistered users) to the entire world for something around half our contributors?

We don't "release" IPs, they are visible by default unless a person chooses to create an account. A person's IP is visible unless a person chooses to make it private. Anonymous users don't have an expectation of their IPs being kept private unless they've taken steps to make it so. Registered users do have that expectation. --Andrew Whitworth

Gregory Maxwell

6:12 p.m.

On Mon, Jun 16, 2008 at 1:48 PM, Andrew Whitworth <wknight8111(a)gmail.com> wrote:

...

On Mon, Jun 16, 2008 at 1:27 PM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:

Okay. So why do we give a checkuser equivalent (releasing the IPs of unregistered users) to the entire world for something around half our contributors?

"Anonymous" people don't have the expectation of being .. anonymous. Funny that. Meanwhile registered users routinely manage to edit logged-out with negative consequences.... Even really technically competent users. In both cases I think it is clear that we are frequently failing the expectations of our contributors. Being realistic, most new people editing Wikipedia have no clue what an IP address is and are quite likely unaware of the consequences of their actions until they are too late. Even if they manage to realize their IP will be public, and that it could allow someone to identify them, many people can't imagine that someone would bother digging up some random edit on a website and turn it into a press event. No amount of angry red explanatory text can really solve that. Saying that we don't release them is, in my view, playing an unhelpful semantic game. The world would not know their IPs if we did not configure out software to publish them. Mediawiki is one of fairly few web applications which publicly display user's IPs. It's certainly not the norm on the internet. The brilliance of modern industrial safety standards is that they realize that people are often uninformed, inattentive, and sometimes outright stupid and they strive to design systems which do not cause serious harm when combined with typical users. Making a nearly irreversible public disclosure of a persons IP simply because they clicked a button without reading a kilobyte of largely irrelevant text can not be called a harm-minimizing technique. I'm a bit puzzled that a list that can produce hundreds of messages about stalking doesn't stand up and recognize this sort of surprise avoidance as an ethical necessity. On Mon, Jun 16, 2008 at 1:56 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

We have always indicated to people who were interested in this that true anonymity can not be found in anonymous editing. We have always indicated that anonymity will be better preserved by making use of a user handle.

And yet we continue to find that people are surprised and confused by our practices and language. Merely telling people is not enough in all cases. If we are realistic we will realize that many people do not read or do not understand. We can not save everyone from themselves but we should strive to achieve the best outcome for the user in the majority of cases.

...

As you are continuing your game of silly buggers, it is a choice to use a user or not. If people are not troubled by sub optimal anonymity, it is their choice. At the same time they provide us with the handles to help fight vandals. Changing this will not help us produce more or better content for our projects. It makes our servers less responsive so I do not think this should have much or any priority..

You say that people are making a conscious choice to be less anonymous (by using our 'anonymous editing'), but most Wikipedians would agree that unregistered users are a larger source of trouble edits than named users. If your goal was to cause trouble and you were able to make an informed choice, wouldn't you always choose to be more anonymous rather than less? I think it's a lot more likely that many people have no clue. Ah... Forget it. We won't come to an understanding. ... At least I tried.

Nathan

7:02 p.m.

Switching from IPs to randomized or encrypted numerical identifiers strikes me as a major change at first, but thinking about it and reading Gregory's rationale I'm not so sure. We could easily implement it so that administrators automatically see IPs or can convert in one step, meaning issues of tracking and dealing with vandalism wouldn't apply. I'm sure I'm missing the other technical and procedural drawbacks - can someone explain them to me? It'd obviously disable Wikiscanner as a public tool, and it would more than likely hit the press at some level, but what other detrimental effects would this change have? Nathan

Henning Schlottmann

17 Jun 17 Jun

11:16 a.m.

Gregory Maxwell wrote:

...

"Anonymous" people don't have the expectation of being .. anonymous.

Editing without registration is not so much about anonymity but about openness and convenience. We allow anyone to edit Wikipedia - without prior registration. That's the idea about it. Their edits are registered under their technical identifier on the internet which is of course the IP address. It is not about being untraceable. Nothing on the edit window you see if not logged in claims anything about anonymity. Ciao Henning

David Gerard

11:37 a.m.

2008/6/17 Henning Schlottmann <h.schlottmann(a)gmx.net>et>:

...

Gregory Maxwell wrote:

...

> "Anonymous" people don't have the expectation of being .. anonymous.

...

On en:wp, it warns as clearly as it can: "You are not currently logged in. Editing this way will cause your IP address to be recorded publicly in this page's edit history. If you create an account, you can conceal your IP address and be provided with many other benefits. Messages sent to your IP can be viewed on your talk page." I suppose we could wrap it in <font color="red"><big><blink> ... - d.

Andrew Gray

8:15 a.m.

2008/6/16 Andrew Whitworth <wknight8111(a)gmail.com>om>:

...

Well, unregistered users *who are familiar with the way we do things* don't have an expectation of their IPs being kept private. There are a lot of unregistered users who aren't familiar with the way we do things, and get rather shocked to discover we're publishing their IP - it's a very common complaint. (They usually also misunderstand quite what the IP is, but that's another issue) -- - Andrew Gray andrew.gray(a)dunelm.org.uk

Gerard Meijssen

16 Jun 16 Jun

5:56 p.m.

Hoi, We have always indicated to people who were interested in this that true anonymity can not be found in anonymous editing. We have always indicated that anonymity will be better preserved by making use of a user handle. As you are continuing your game of silly buggers, it is a choice to use a user or not. If people are not troubled by sub optimal anonymity, it is their choice. At the same time they provide us with the handles to help fight vandals. Changing this will not help us produce more or better content for our projects. It makes our servers less responsive so I do not think this should have much or any priority.. Thanks, GerardM On Mon, Jun 16, 2008 at 7:27 PM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:

...

On Mon, Jun 16, 2008 at 1:06 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

Hoi, This is becoming a nice game of silly buggers. I will bite anyway. One reason why we should not grant access to Special:Checkuser to

everyone

is because it would have all kinds of really nasty side effects. For one

would make the life of stalkers that much easier. Giving stalkers the use

this tool would effectively harm first the community and because of the implications it would then harm the project.

Okay. So why do we give a checkuser equivalent (releasing the IPs of unregistered users) to the entire world for something around half our contributors? _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Gregory Maxwell

15 Jun 15 Jun

10:37 p.m.

On Sun, Jun 15, 2008 at 10:15 AM, geni <geniice(a)gmail.com> wrote:

...

2008/6/15 Anthony <wikimail(a)inbox.org>rg>:

It's hard not to. If we were to say assign a random number to every IP then by now someone would have published a partial list of number to IP relationships.

How? I can't see how they could do this except by even more limited means than they can use to currently publish User name->IP connections. The only means I can see someone connecting an opaque ID with an IP is: 1. Actually editing from that IP and recording the result. 2. Tricking a user on that IP into following an external link. 3. Checkuser 4. Compromise of the foundation servers. ... All of those are a much higher hurdle than the casual leaks users perform on their own all the time. For example, today, just minutes after complaining about it I was somehow logged out on meta and managed to accidentally disclose my IP. It's a constant problem. We could also do blocked encryption for partial addresses: Encrypt the first 24 bits, then the whole 32 bits. This would leak a lot more information, but it would preserve the ability for everyone to quickly tell if two unregistered users are on the same /24.

...

If the number assigns keep changing well we know the problems that we had with AOL back in the day.

I don't see a huge need to make the numbers change.. but we could address this if we wanted to. We could provide a two part identifier for unregistered users: Enc(Secret[n-1], IP), Enc(Secret[n], IP) and increment N every 3 months, so if a particular IP goes 6 months between edits the connection will be broken. Given the rate of IP reassignment in the internet doing this would be reasonable.. but I don't see why it would be necessary. For example, On day one an unregistered user would look like User:.AY3BXQM,B4WVJAM Three months later: User:.B4WVJAM,W93GI2A Three months later: User:.W93GI2A,CT7WLMA If the user didn't make the middle edit the unregistered identities would become disconnected except for checkusers. ::shrugs:: As I said, I don't see the need of anything that complex.

Anthony

2:54 p.m.

Something else I think is worth pointing out: "the raw log data is not made public, and is normally discarded after about two weeks." has changed to "The raw log data is kept indefinitely, but is not made public." I get the impression that this isn't a change in policy, so much as a change in wording. But then, it does seem to contradict the Data Retention Policy (http://wikimediafoundation.org/wiki/Resolution:Data_Retention_Policy).

Matthew Brown

9:06 p.m.

On Sun, Jun 15, 2008 at 7:54 AM, Anthony <wikimail(a)inbox.org> wrote:

...

Not necessarily a contradiction, given the ambiguity of wording of both statements. 'Indefinitely' just means "for an undefined period of time", after all. It just means that the privacy policy no longer states any retention period. -Matt

Luna

9:36 p.m.

Thanks for announcing this on the mailing list, Florence. Some interesting discussion at http://meta.wikimedia.org/wiki/Talk:Draft_Privacy_Policy_June_2008 -- in particular regarding a rewritten (and considerably shorter) draft. On Sun, Jun 15, 2008 at 2:06 PM, Matthew Brown <morven(a)gmail.com> wrote:

...

'Indefinitely' just means "for an undefined period of time", after all. It just means that the privacy policy no longer states any retention period.

Not specifically replying to this, but it was a tempting snippet to quote. Seems easiest to specify the retention period outside of the privacy policy -- it may be enough to just say it's discarded periodically. -Luna

Brion Vibber

21 Jun 21 Jun

7:22 p.m.

Anthony wrote:

...

It's an update to reflect actual practice.

...

But then, it does seem to contradict the Data Retention Policy (http://wikimediafoundation.org/wiki/Resolution:Data_Retention_Policy).

The data retention policy is, shall we say, super vague. It makes no specific provisions, but iterates our general preference for not keeping lots of private data around for a long time. CheckUser data -- the scariest for most people as it records IP, proxy forward records, and user-agent for ALL EDITS -- is kept for 90 days in the database. We currently keep 1:1000-sampled *HTTP proxy logs* indefinitely; every once in a while the whole bunch gets scanned over to batch out some ad-hoc statistics. At some point we expect to normalize that process, at which point we'll have no need to keep around the old log samples. (These would be basically worthless for any kind of track-down-a-user lookup since there's a 99.9% chance that whatever you're looking for isn't in it, and even if it is you don't have enough information in the log to tell.) Various debug logs are also kept indefinitely, and from time to time old ones get thrown out. -- brion

Anthony

8:53 p.m.

On Sat, Jun 21, 2008 at 3:22 PM, Brion Vibber <brion(a)wikimedia.org> wrote:

...

The data retention policy is, shall we say, super vague. It makes no specific provisions, but iterates our general preference for not keeping lots of private data around for a long time.

Ron Moss

22 Jun 22 Jun

12:18 a.m.

I am glad to hear that you are a man of the law. To share this, the following, I can depend on you. 'The original income tax was restricted to federal sources to be legal, Post office workers, federal judges, military, etc. Then with the excitment of World War 2 coming to an end. The IRS took advantage of that atmosphere and forgot to tell us our private income was no longer taxable, and it still isn't. " Google Peter Hendrickson for the most accurate. His book is in it's ninth printing, Then google "Sherry Peel Jackson and Joe Banister" both former IRS Goons. Mafia type. Thanks Anthony. I have entrusted you with these secrets. Don"t tell anybody. Ron Moss----- Original Message ----- From: "Anthony" <wikimail(a)inbox.org> To: "Wikimedia Foundation Mailing List" <foundation-l(a)lists.wikimedia.org> Sent: Saturday, June 21, 2008 1:53 PM Subject: Re: [Foundation-l] New draft of privacy policy

...

On Sat, Jun 21, 2008 at 3:22 PM, Brion Vibber <brion(a)wikimedia.org> wrote:

The data retention policy is, shall we say, super vague. It makes no specific provisions, but iterates our general preference for not keeping lots of private data around for a long time.

"the least of amount personally identifiable information consistent with maintenance of its services, with its privacy policy, or as required by state or federal legal provisions under United States of America law" is fairly specific. If you can keep less, but still fulfill your services, the privacy policy, and the law, then the data retention policy states that you should do so. _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Brion Vibber

24 Jun 24 Jun

6:13 a.m.

Anthony wrote:

...

On Sat, Jun 21, 2008 at 3:22 PM, Brion Vibber <brion(a)wikimedia.org> wrote:

The data retention policy is, shall we say, super vague. It makes no specific provisions, but iterates our general preference for not keeping lots of private data around for a long time.

The key phrase is "consistent with maintenance of its services", which is very much open-ended. It does not say anything specific about what we can or can't keep, or how quickly we should discard data or how long we must keep it, but simply indicates that we should: * keep any data as required by law * keep any data as long as we need it for an actual purpose * not keep data we don't need for an actual purpose This is not to say that we will look up everyone's IP and store a giant table forever listing your address that we looked up on Google Maps with a picture of your house. :) The point is simply that the data retention policy (which states our preference to not retain unnecessary data, as a goal to *drive* our actual behavior, for the company) is not redundant to the privacy policy (which lists specific things we do keep, as a *description* of our actual behavior, for the end-user). -- brion

Florence Devouard

21 Jun 21 Jun

1:26 p.m.

New subject: [Foundation-l] New draft of privacy policy (urgent)

Florence Devouard wrote:

...

Hello, Mike has written a new version of the privacy policy, taking into account comments made on meta and this list, until the 19th of June. You may find this new version here: http://meta.wikimedia.org/wiki/Draft_Privacy_Policy_June_19_2008 I apology to those of you who worked on another draft in parallel. In an ideal world, people would agree to work in the same place together (eg, meta), rather than a mix of private and public wikis. And in an ideal world, the Chair would not have to do the back and fro between people to make sure input from everyone is collected. But unfortunately (breaking news !!!), we do not live in an ideal world. So, here is the new version; and, if possible, I'd like all those who made comments until now, or those who worked on the parallel version, to check if the new "lawyer approved" version seems fine or not. Please do not hesitate to give a feedback. Bad news, the board meeting is this evening and I am running out of time this afternoon. So, if you guys think there is still work to do on it, I'll remove it from the agenda this evening and it will be put on July agenda. If no comment, I'll put it up for a vote. Sorry for the short delay. Anthere

Thomas Dalton

2:19 p.m.

New subject: [Foundation-l] New draft of privacy policy (urgent)

Quick response, since you're in a hurry and I don't have much time right now: 1) Still too long and rambling. It's clearly written with the intention of educating users about privacy on Wikimedia projects, rather than primarily as a binding policy on WMF, yet no user is ever going to read it. Users will load the page, see how long it is and close the page again. If you want to educate users, you need another way to do it. Make the privacy policy simply a policy, a list of things that WMF commits to doing. 2) Why abbreviate "Wikimedia projects" to "WMProjects"? It's not a commonly used abbreviation, isn't much shorter and just makes the text harder to read. Either keep the full name, or abbreviate to "the projects". Tango PS I'm trying to remember to sign my emails since it's been brought to my attention that a lot of people seem to think I'm someone completely different, which can't be a good thing (either for him or me, I haven't decided!! ;)).

Jesse Plamondon-Willard

3:55 p.m.

New subject: [Foundation-l] New draft of privacy policy (urgent)

Thomas Dalton <thomas.dalton(a)gmail.com> wrote:

...

Still too long and rambling. It's clearly written with the intention of educating users about privacy on Wikimedia projects, rather than primarily as a binding policy on WMF, yet no user is ever going to read it.

Agreed; I don't think this is ready for approval. The policy should only contain policy, and link elsewhere for documentation and philosophy. This is particularly important since we cannot regularly edit the privacy policy to correct the documentation and philosophy. -- Yours cordially, Jesse Plamondon-Willard (Pathoschild)

Jussi-Ville Heiskanen

2:29 p.m.

New subject: [Foundation-l] New draft of privacy policy (urgent)

Florence Devouard wrote:

...

Florence Devouard wrote:

"The Wikimedia Foundation that maintaining and preserving the privacy of user data is an important value." That sentence has no verb... Yours; Jussi-Ville Heiskanen

Ryan

22 Jun 22 Jun

7:13 a.m.

New subject: [Foundation-l] New draft of privacy policy (urgent)

On Sat, Jun 21, 2008 at 9:26 AM, Florence Devouard <Anthere9(a)yahoo.com> wrote:

...

First of all, thank you Florence and Mike for your work on this. I noticed that it doesn't mention anywhere the possibility that the policy may be altered in the future. Most sites, including Yahoo and Google, do so. Is this omission accidental or deliberate? Is such a mention either necessary or encouraged legally? In general, I agree with others that the policy might be worth splitting up. But if it isn't, I think it should be pruned. For example, I'm looking at Section VIII, Point C. Why in the world is it necessary in a privacy policy to specifically mention "badly-behaved web spiders" as a possible reason for examining log data? The mention of IRC is strange. IRC is not a Wikimedia venue, so perhaps it should be removed completely. But if it is to be left, why was the mention about the possible exposure of IPs deleted? Surely that's an important privacy concern regarding IRC (where the IRC guidelines have nothing to do with privacy) Section IV also reads more like a manual than a policy. Perhaps that was the intent, but I think more than anything, we should be informing our users, not teaching them. Perhaps my main point is this: -- Yahoo privacy policy: 1,427 words -- Google privacy policy: 1,858 words -- Myspace privacy policy: 2,322 words -- WMF current privacy policy: 1,767 words -- WMF privacy policy draft: 5,081 words A privacy policy should not be a TL;DR for the majority of our contributors, even if a clear majority will never read it. I imagine the ones that would read it would appreciate (relative) brevity. -- [[User:Ral315]]

Gerard Meijssen

24 Jun 24 Jun

11:29 a.m.

New subject: [Foundation-l] New draft of privacy policy (urgent)

Hoi, Please eat your own dog food ... TL;DR is what ? Thanks, GerardM On Sun, Jun 22, 2008 at 9:13 AM, Ryan <wiki.ral315(a)gmail.com> wrote:

...

On Sat, Jun 21, 2008 at 9:26 AM, Florence Devouard <Anthere9(a)yahoo.com> wrote:

effe iets anders

11:40 a.m.

New subject: [Foundation-l] New draft of privacy policy (urgent)

Hi, I asked him on private already, apperently it is internet slang for "Too long; didn't read" http://en.wiktionary.org/wiki/TLDR Best regards, Lodewijk 2008/6/24 Gerard Meijssen <gerard.meijssen(a)gmail.com>om>:

...

Hoi, Please eat your own dog food ... TL;DR is what ? Thanks, GerardM On Sun, Jun 22, 2008 at 9:13 AM, Ryan <wiki.ral315(a)gmail.com> wrote:

On Sat, Jun 21, 2008 at 9:26 AM, Florence Devouard <Anthere9(a)yahoo.com> wrote:

_______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

5794

days inactive

5804

days old

wikimedia-l@lists.wikimedia.org

Manage subscription

38 comments

21 participants

tags (0)

participants (21)

Andrew Gray
Andrew Whitworth
Anthony
Brion Vibber
David Gerard
effe iets anders
Florence Devouard
geni
Gerard Meijssen
Gregory Maxwell
Henning Schlottmann
Jesse Plamondon-Willard
John Vandenberg
Jussi-Ville Heiskanen
Luna
Matthew Brown
Nathan
Ray Saintonge
Ron Moss
Ryan
Thomas Dalton