-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I could not find this in the privacy policy... however, what is Wikimedia's current data retention policy? That is to ask, how long do projects keep data for use in tools such as checkuser?
- -- Best, Jon
[User:NonvocalScream]
On Wed, Sep 10, 2008 at 10:43 PM, Jon scream@datascreamer.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I could not find this in the privacy policy... however, what is Wikimedia's current data retention policy? That is to ask, how long do projects keep data for use in tools such as checkuser?
Best, Jon
[User:NonvocalScream] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjIhc0ACgkQ6+ro8Pm1AtVW/QCgrCy1HajtTeGCRBcxD/ZRZxDk l8oAniHizNJEpEXKhBIBqg9rIlnqhIRJ =mqSc -----END PGP SIGNATURE-----
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
http://wikimediafoundation.org/wiki/Resolution:Data_retention_policy
-Chad
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Chad wrote:
On Wed, Sep 10, 2008 at 10:43 PM, Jon scream@datascreamer.com wrote: I could not find this in the privacy policy... however, what is Wikimedia's current data retention policy? That is to ask, how long do projects keep data for use in tools such as checkuser?
_______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
http://wikimediafoundation.org/wiki/Resolution:Data_retention_policy
-Chad
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Thank you, but that does not answer my question however. What are the servers technically set for?
- -- Best, Jon
[User:NonvocalScream]
Jon wrote:
I could not find this in the privacy policy... however, what is Wikimedia's current data retention policy? That is to ask, how long do projects keep data for use in tools such as checkuser?
CheckUser data used to be kept for 3 months, but Aaron recently increased it to 5 months. I'm not sure why or on whose authority.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?r1=39734&r2=40620
-- Tim Starling
On Wed, Sep 10, 2008 at 11:11 PM, Tim Starling tstarling@wikimedia.org wrote:
Jon wrote:
I could not find this in the privacy policy... however, what is Wikimedia's current data retention policy? That is to ask, how long do projects keep data for use in tools such as checkuser?
CheckUser data used to be kept for 3 months, but Aaron recently increased it to 5 months. I'm not sure why or on whose authority.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?r1=39734&r2=40620
I think Jon was inquiring about more than just checkuser (notice the "such as"). I would assume that anyone asking about data retention in general is not overly concerned with the specific modes of retention, but is more concerned with the maximum retention time (across all modes) of any particular type of private data.
Gregory Maxwell wrote:
On Wed, Sep 10, 2008 at 11:11 PM, Tim Starling tstarling@wikimedia.org wrote:
Jon wrote:
I could not find this in the privacy policy... however, what is Wikimedia's current data retention policy? That is to ask, how long do projects keep data for use in tools such as checkuser?
CheckUser data used to be kept for 3 months, but Aaron recently increased it to 5 months. I'm not sure why or on whose authority.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?r1=39734&r2=40620
I think Jon was inquiring about more than just checkuser (notice the "such as"). I would assume that anyone asking about data retention in general is not overly concerned with the specific modes of retention, but is more concerned with the maximum retention time (across all modes) of any particular type of private data.
The other logs are not automatically rotated, and need to be manually purged. The retention time is thus not consistent. Typically we have kept around 6 months of data. There are error logs, and logs for various kinds of special requests. They are not used for sockpuppet investigation.
I've said in the past that I think 6 months would be a reasonable horizon for all private data -- it would give us plenty of data for operations, and would be a far shorter period than that used by the large commercial websites.
-- Tim Starling
On Thu, Sep 11, 2008 at 12:26 AM, Tim Starling tstarling@wikimedia.org wrote:
I've said in the past that I think 6 months would be a reasonable horizon for all private data -- it would give us plenty of data for operations, and would be a far shorter period than that used by the large commercial websites.
Not that far, for some: Google is cutting back to nine months soon.
http://googleblog.blogspot.com/2008/09/another-step-to-protect-user-privacy....
Is there a plan to detail all of this including time frames in a more public location, such as the privacy policy? If not, why not?
Joe
On Wed, Sep 10, 2008 at 9:26 PM, Tim Starling tstarling@wikimedia.orgwrote:
Gregory Maxwell wrote:
On Wed, Sep 10, 2008 at 11:11 PM, Tim Starling tstarling@wikimedia.org
wrote:
Jon wrote:
I could not find this in the privacy policy... however, what is Wikimedia's current data retention policy? That is to ask, how long do projects keep data for use in tools such as checkuser?
CheckUser data used to be kept for 3 months, but Aaron recently
increased
it to 5 months. I'm not sure why or on whose authority.
<
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs...
I think Jon was inquiring about more than just checkuser (notice the "such as"). I would assume that anyone asking about data retention in general is not overly concerned with the specific modes of retention, but is more concerned with the maximum retention time (across all modes) of any particular type of private data.
The other logs are not automatically rotated, and need to be manually purged. The retention time is thus not consistent. Typically we have kept around 6 months of data. There are error logs, and logs for various kinds of special requests. They are not used for sockpuppet investigation.
I've said in the past that I think 6 months would be a reasonable horizon for all private data -- it would give us plenty of data for operations, and would be a far shorter period than that used by the large commercial websites.
-- Tim Starling
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
2008/9/15 Joe Szilagyi szilagyi@gmail.com:
Is there a plan to detail all of this including time frames in a more public location, such as the privacy policy? If not, why not?
The policy is meant to be what the foundation commits to doing, it's doesn't describe what they actually do. As long as what they actually do satisfies the policy, everything is ok.
Maybe legally, but it would be much better if they actually told us what they are doing.
On Mon, Sep 15, 2008 at 11:10 AM, Thomas Dalton thomas.dalton@gmail.com wrote:
2008/9/15 Joe Szilagyi szilagyi@gmail.com:
Is there a plan to detail all of this including time frames in a more public location, such as the privacy policy? If not, why not?
The policy is meant to be what the foundation commits to doing, it's doesn't describe what they actually do. As long as what they actually do satisfies the policy, everything is ok.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
2008/9/15 mboverload mboverloadlister@gmail.com:
Maybe legally, but it would be much better if they actually told us what they are doing.
Maybe, but not in the privacy policy.
On Mon, Sep 15, 2008 at 4:33 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/15 mboverload mboverloadlister@gmail.com:
Maybe legally, but it would be much better if they actually told us what they are doing.
Maybe, but not in the privacy policy.
Why not? What's the point of having a privacy policy if you're going to make it a long-winded version of "we can do anything we want to do"?
On Mon, Sep 15, 2008 at 10:34 PM, Anthony wikimail@inbox.org wrote:
On Mon, Sep 15, 2008 at 4:33 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/15 mboverload mboverloadlister@gmail.com:
Maybe legally, but it would be much better if they actually told us what they are doing.
Maybe, but not in the privacy policy.
Why not? What's the point of having a privacy policy if you're going to make it a long-winded version of "we can do anything we want to do"? _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Agreed. I too see no reason why it hurts to have this relevant information added the privacy policy. If not that, then the data retention policy. Either seems like a good location to put this information.
-Chad
On Mon, Sep 15, 2008 at 7:59 PM, Chad innocentkiller@gmail.com wrote:
Agreed. I too see no reason why it hurts to have this relevant information added the privacy policy. If not that, then the data retention policy. Either seems like a good location to put this information.
-Chad
I see no reason to object to the release of this information, as well. I mean, jebus, this is Wikipedia, the darling of open and free information.
2008/9/16 Anthony wikimail@inbox.org:
On Mon, Sep 15, 2008 at 4:33 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/15 mboverload mboverloadlister@gmail.com:
Maybe legally, but it would be much better if they actually told us what they are doing.
Maybe, but not in the privacy policy.
Why not? What's the point of having a privacy policy if you're going to make it a long-winded version of "we can do anything we want to do"?
It doesn't say that at all. The privacy policy is designed to be the absolute minimum that the foundation commits to doing. In most cases, the foundation actually goes a little further than that. There is no harm in publishing that actual practice, but the foundation shouldn't be committing to continuing that practice - it chose what to commit to very carefully when writing the policy and with good reason, restricting itself further would make things harder without significant gain.
Let me give an example with made up numbers. If the policy says "We will not keep logs longer than 2 months", however actual practice is to only keep them one month, then that's fine, we're following policy. If someone then makes a mistake forgets to delete the logs before they go home on Friday night and deletes them when they get back in on Monday when they're a month and a day old, there is no problem, because we're still within policy. If we'd changed policy to describe what usually happened, we'd now have violated the policy. It's always good to make actual practice a little stricter than policy in order to absorb mistakes - that doesn't work if you then change the policy to describe the practice.
Hello,
harm in publishing that actual practice, but the foundation shouldn't be committing to continuing that practice - it chose what to commit to very carefully when writing the policy and with good reason, restricting itself further would make things harder without significant gain.
When we initiated the idea of having better privacy policy and data retention documentation, the idea of making actual numbers public was always there. Still, people have to realize that the actual numbers can be different from a week-old documentation, and various copies and interpretations of documentation can be desynched too.
It is very difficult to commit with a resolution, that every aspect of data retention has to be accurately described.
Thomas Dalton wrote:
2008/9/16 Anthony wikimail@inbox.org:
On Mon, Sep 15, 2008 at 4:33 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/15 mboverload mboverloadlister@gmail.com:
Maybe legally, but it would be much better if they actually told us what they are doing.
Maybe, but not in the privacy policy.
Why not? What's the point of having a privacy policy if you're going to make it a long-winded version of "we can do anything we want to do"?
It doesn't say that at all. The privacy policy is designed to be the absolute minimum that the foundation commits to doing. In most cases, the foundation actually goes a little further than that. There is no harm in publishing that actual practice, but the foundation shouldn't be committing to continuing that practice - it chose what to commit to very carefully when writing the policy and with good reason, restricting itself further would make things harder without significant gain.
Let me give an example with made up numbers. If the policy says "We will not keep logs longer than 2 months", however actual practice is to only keep them one month, then that's fine, we're following policy. If someone then makes a mistake forgets to delete the logs before they go home on Friday night and deletes them when they get back in on Monday when they're a month and a day old, there is no problem, because we're still within policy. If we'd changed policy to describe what usually happened, we'd now have violated the policy. It's always good to make actual practice a little stricter than policy in order to absorb mistakes - that doesn't work if you then change the policy to describe the practice.
Precisely, Thomas, thanks. I was just coming here to write something to this effect, but you beat me to it :-)
Basically, policy contains our high-level minimum commitment: what we want to do, intend to do, and can & will commit to continuing doing. It's the minimum that users can expect from us.
And as Domas points out (and Tim proves with his Aaron example, upthread), you don't want to commit at the policy level to things that may change, particularly if the change can take place with no warning or intent.
On 9/15/08, Joe Szilagyi szilagyi@gmail.com wrote:
CheckUser data used to be kept for 3 months, but Aaron recently increased it to 5 months. I'm not sure why or on whose authority.
<http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs...
I think Jon was inquiring about more than just checkuser (notice the "such as"). I would assume that anyone asking about data retention in general is not overly concerned with the specific modes of retention, but is more concerned with the maximum retention time (across all modes) of any particular type of private data.
The other logs are not automatically rotated, and need to be manually purged. The retention time is thus not consistent. Typically we have kept around 6 months of data. There are error logs, and logs for various kinds of special requests. They are not used for sockpuppet investigation.
I've said in the past that I think 6 months would be a reasonable horizon for all private data -- it would give us plenty of data for operations, and would be a far shorter period than that used by the large commercial websites.
Is there a plan to detail all of this including time frames in a more public location, such as the privacy policy? If not, why not?
I think the basis for this change is that checkuser data older than 5 months has a 1% chance of being deleted with each passing edit[1], and will [[almost surely]] be deleted before any of it becomes 6 months old.
To get a rough idea of how many edits we get per month I picked a recentchanges diff from a few minutes ago and compared it to an edit from a month earlier:
Edit at 15:07, 17 September 2008 by User:Hapsala on [[2008 Russian financial crisis]] [2] Edit at 15:07, 17 August 2008 by User:118d on [[Buses in London]] [3]
6535756 edits in this selected one-month period.
I just realized August has 31 days so let's round it down to 6.5 million :P
So there is a one percent chance of initiating a checkuser data purge, and a 0.99 chance that all data will remain intact for the time being. This 99% may seem high, but it becomes negligible over a month. I don't know what the exact odds is as I cannot find a calculator that gives me a non-zero result for "0.99 ^ (6.5 million)" [4].
Disclaimer: I don't know what kind of random number generator is being used here so I can't comment on the integrity of it.
But I can say that the numbers game becomes less laughable on smaller projects. Let's take the [[Hungarian Wikinews]] for example, which had only 374 edits in an equal time-span[5][6].
So on this project there would be a 2.331 percent chance of over-retaining checkuser data in violation of the privacy policy[7].
In any case the $wgCUDMaxAge has been reverted back to three months[8]. Probably a good call.
—C.W.
P.S. I love the edit summary[9]. Tim should run for arbcom, or something.
[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs... [2] http://en.wikipedia.org/w/index.php?title=2008_Russian_financial_crisis&... [3] http://en.wikipedia.org/w/index.php?title=Buses_in_London&diff=232502170 [4] http://www.google.com/search?q=.99+%5E+6500000 [5] http://hu.wikinews.org/w/index.php?title=Sablon:Olaj&diff=9244 [6] http://hu.wikinews.org/w/index.php?title=Szerkeszt%C5%91:Gondnok&diff=88... [7] http://www.google.com/search?q=.99+%5E+374 [8] http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs... [9] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=40847
So there is a one percent chance of initiating a checkuser data purge, and a 0.99 chance that all data will remain intact for the time being. This 99% may seem high, but it becomes negligible over a month. I don't know what the exact odds is as I cannot find a calculator that gives me a non-zero result for "0.99 ^ (6.5 million)" [4].
0.99^6500000=5.819478586x10^-28372
To put that in perspective, if we'd done this every month since the universe began (about 14 billion years ago), the probability of there being at least one month in which it didn't get purged would be... hmmm... "0". So much for Maple...
2008/9/17 Thomas Dalton thomas.dalton@gmail.com:
So there is a one percent chance of initiating a checkuser data purge, and a 0.99 chance that all data will remain intact for the time being. This 99% may seem high, but it becomes negligible over a month. I don't know what the exact odds is as I cannot find a calculator that gives me a non-zero result for "0.99 ^ (6.5 million)" [4].
0.99^6500000=5.819478586x10^-28372
To put that in perspective, if we'd done this every month since the universe began (about 14 billion years ago), the probability of there being at least one month in which it didn't get purged would be... hmmm... "0". So much for Maple...
After realising that with numbers this small 1st order approximations are almost exact, it's 9.776724024x10^-28364, which is no more in perspective than before... I think it's best just to say "It'll never happen" and be done with it!.
On Wed, Sep 17, 2008 at 12:36 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
So there is a one percent chance of initiating a checkuser data purge, and a 0.99 chance that all data will remain intact for the time being. This 99% may seem high, but it becomes negligible over a month. I don't know what the exact odds is as I cannot find a calculator that gives me a non-zero result for "0.99 ^ (6.5 million)" [4].
0.99^6500000=5.819478586x10^-28372
To put that in perspective, if we'd done this every month since the universe began (about 14 billion years ago), the probability of there being at least one month in which it didn't get purged would be... hmmm... "0". So much for Maple...
I thought the checkuser data was moved out of the recentchanges table.
On Wed, Sep 17, 2008 at 2:58 PM, Anthony wikimail@inbox.org wrote:
On Wed, Sep 17, 2008 at 12:36 PM, Thomas Dalton <thomas.dalton@gmail.com
wrote:
So there is a one percent chance of initiating a checkuser data purge, and a 0.99 chance that all data will remain intact for the time being. This 99% may seem high, but it becomes negligible over a month. I don't know what the exact odds is as I cannot find a calculator that gives me a non-zero result for "0.99 ^ (6.5 million)" [4].
0.99^6500000=5.819478586x10^-28372
To put that in perspective, if we'd done this every month since the universe began (about 14 billion years ago), the probability of there being at least one month in which it didn't get purged would be... hmmm... "0". So much for Maple...
I thought the checkuser data was moved out of the recentchanges table.
That is what has been said around the chatter lines. Was this documented in the SVN somewhere if so, and approved? For all Wikis? Just some?
- Joe
2008/9/18 Joe Szilagyi szilagyi@gmail.com:
On Wed, Sep 17, 2008 at 2:58 PM, Anthony wikimail@inbox.org wrote:
I thought the checkuser data was moved out of the recentchanges table.
That is what has been said around the chatter lines. Was this documented in the SVN somewhere if so, and approved? For all Wikis? Just some?
It's true: if you want to hide something, put it in the manual.
http://www.mediawiki.org/wiki/Extension:CheckUser
- d.
Multiple replies below.
Charlotte Webb wrote:
But I can say that the numbers game becomes less laughable on smaller projects. Let's take the [[Hungarian Wikinews]] for example, which had only 374 edits in an equal time-span[5][6].
So on this project there would be a 2.331 percent chance of over-retaining checkuser data in violation of the privacy policy[7].
Keeping data for more than 6 months does not violate the privacy policy.
Thomas Dalton wrote:
0.99^6500000=5.819478586x10^-28372
Note that if you don't have an arbitrary precision calculator handy:
0.99^6500000 = 10^(6500000 * log_10(0.99)) ~= 10^-28371.
Anthony wrote:
I thought the checkuser data was moved out of the recentchanges table.
Yes, over a year ago: http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=21016
But we're not talking about the recentchanges table. The cu_changes table is purged in exactly the same way:
# Every 100th edit, prune the checkuser changes table. if( 0 == mt_rand( 0, 99 ) ) { # Periodically flush old entries from the recentchanges table. global $wgCUDMaxAge; $cutoff = $dbw->timestamp( time() - $wgCUDMaxAge ); $recentchanges = $dbw->tableName( 'cu_changes' ); $sql = "DELETE FROM $recentchanges WHERE cuc_timestamp < '{$cutoff}'"; $dbw->query( $sql ); }
In fact, it's so similar, that whoever added that forgot to rename a variable and change a comment when they copied it from the recentchanges code.
Joe Szilagyi wrote:
That is what has been said around the chatter lines. Was this documented in the SVN somewhere if so, and approved? For all Wikis? Just some?
Are you implying that this change could somehow be controversial? If so, can you explain how that might be?
-- Tim Starling
On Thu, Sep 18, 2008 at 5:33 PM, Tim Starling tstarling@wikimedia.orgwrote:
Joe Szilagyi wrote:
That is what has been said around the chatter lines. Was this documented
in
the SVN somewhere if so, and approved? For all Wikis? Just some?
Are you implying that this change could somehow be controversial? If so, can you explain how that might be?
Not inherently controversial, but I'm not clear on if the CU data retention is the same on each project, or different on each--does Chinese Wikipedia save as long as English Wikipedia? Does Commons save as long as Wikinews, etc.? If there is any change to the actual length in data retention, who makes that decision? The WMF board? Sue? The checkuser mail list? Shouldn't that sort of matter be decided with community input?
I've just been thinking aloud and wondering if the debatable value of any obfuscation of the retention length of Checkuser data, rather than clearly articulating it in public, outweighs the risk and harm to some users given that in the wake of the Poetlister incident we've seen that Checkuser data is not compromise-proof.
- Joe
On Fri, Sep 19, 2008 at 8:24 PM, Joe Szilagyi szilagyi@gmail.com wrote:
I've just been thinking aloud and wondering if the debatable value of any obfuscation of the retention length of Checkuser data, rather than clearly articulating it in public, outweighs the risk and harm to some users given that in the wake of the Poetlister incident we've seen that Checkuser data is not compromise-proof.
While checkuser data can be circumvented by clever users, I should add that checkuser evidence of Poetlister actually did reveal sockpuppeting.
On Fri, Sep 19, 2008 at 11:31 AM, Bryan Tong Minh bryan.tongminh@gmail.comwrote:
On Fri, Sep 19, 2008 at 8:24 PM, Joe Szilagyi szilagyi@gmail.com wrote:
I've just been thinking aloud and wondering if the debatable value of any obfuscation of the retention length of Checkuser data, rather than
clearly
articulating it in public, outweighs the risk and harm to some users
given
that in the wake of the Poetlister incident we've seen that Checkuser
data
is not compromise-proof.
While checkuser data can be circumvented by clever users, I should add that checkuser evidence of Poetlister actually did reveal sockpuppeting.
I know. I meant that we've seen the integrity and security of OUR checkuser records can be compromised, as Poetlister actually had Checkuser himself, as Cato on English Wikiquote. Hence my questions...
- Joe
Joe Szilagyi wrote:
On Thu, Sep 18, 2008 at 5:33 PM, Tim Starling tstarling@wikimedia.orgwrote:
Joe Szilagyi wrote:
That is what has been said around the chatter lines. Was this documented
in
the SVN somewhere if so, and approved? For all Wikis? Just some?
Are you implying that this change could somehow be controversial? If so, can you explain how that might be?
Not inherently controversial, but I'm not clear on if the CU data retention is the same on each project, or different on each--does Chinese Wikipedia save as long as English Wikipedia? Does Commons save as long as Wikinews, etc.? If there is any change to the actual length in data retention, who makes that decision? The WMF board? Sue? The checkuser mail list? Shouldn't that sort of matter be decided with community input?
It's the same everywhere, it's three months. Neither the Board nor the executive have expressed any desire to make that decision, but they are free to weigh in if they want to. We chose the three month figure as a compromise between privacy advocates and troll hunters. The checkuser-l mailing list only represents one of those two groups, which is why I don't think it's appropriate that they should make that decision. I think community input should be encouraged, which is why I think the figure should be public.
I've just been thinking aloud and wondering if the debatable value of any obfuscation of the retention length of Checkuser data, rather than clearly articulating it in public, outweighs the risk and harm to some users given that in the wake of the Poetlister incident we've seen that Checkuser data is not compromise-proof.
Some people have said "trolls just leave the site for 3 months and come back when they know the CheckUser data has expired". The same argument would apply for any finite retention period, it's obvious that some sort of trade-off has to be made. If it's secret and short, then the trolls will work it out eventually anyway. If it's secret and long, then that's the worst possible situation for privacy.
Troll hunters can and should retain CheckUser results for particularly troublesome users, beyond the database retention period. Often, their IP address becomes public anyway, when it is found in email headers and anonymous edits.
If a troll stays away from the site for 3 months to avoid detection by CheckUser, then you should consider yourself lucky. That's one of the best possible outcomes. There are lots of ways a troll can disrupt the site continuously, regardless of the data retention time.
-- Tim Starling
I wholely agree with Tim. 3 months is a good compromise.
There's nothing wrong with wanting to catch trolls. There is also nothing wrong with wanting to uphold privacy.
Thus you need a balance of the two, and neither group should have a full say in the matter (although I am on the privacy side!).
On Fri, Sep 19, 2008 at 7:23 PM, Tim Starling tstarling@wikimedia.org wrote:
Joe Szilagyi wrote:
On Thu, Sep 18, 2008 at 5:33 PM, Tim Starling tstarling@wikimedia.orgwrote:
Joe Szilagyi wrote:
That is what has been said around the chatter lines. Was this documented
in
the SVN somewhere if so, and approved? For all Wikis? Just some?
Are you implying that this change could somehow be controversial? If so, can you explain how that might be?
Not inherently controversial, but I'm not clear on if the CU data retention is the same on each project, or different on each--does Chinese Wikipedia save as long as English Wikipedia? Does Commons save as long as Wikinews, etc.? If there is any change to the actual length in data retention, who makes that decision? The WMF board? Sue? The checkuser mail list? Shouldn't that sort of matter be decided with community input?
It's the same everywhere, it's three months. Neither the Board nor the executive have expressed any desire to make that decision, but they are free to weigh in if they want to. We chose the three month figure as a compromise between privacy advocates and troll hunters. The checkuser-l mailing list only represents one of those two groups, which is why I don't think it's appropriate that they should make that decision. I think community input should be encouraged, which is why I think the figure should be public.
I've just been thinking aloud and wondering if the debatable value of any obfuscation of the retention length of Checkuser data, rather than clearly articulating it in public, outweighs the risk and harm to some users given that in the wake of the Poetlister incident we've seen that Checkuser data is not compromise-proof.
Some people have said "trolls just leave the site for 3 months and come back when they know the CheckUser data has expired". The same argument would apply for any finite retention period, it's obvious that some sort of trade-off has to be made. If it's secret and short, then the trolls will work it out eventually anyway. If it's secret and long, then that's the worst possible situation for privacy.
Troll hunters can and should retain CheckUser results for particularly troublesome users, beyond the database retention period. Often, their IP address becomes public anyway, when it is found in email headers and anonymous edits.
If a troll stays away from the site for 3 months to avoid detection by CheckUser, then you should consider yourself lucky. That's one of the best possible outcomes. There are lots of ways a troll can disrupt the site continuously, regardless of the data retention time.
-- Tim Starling
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Sat, Sep 20, 2008 at 7:37 AM, Thomas Dalton thomas.dalton@gmail.comwrote:
Troll hunters can and should retain CheckUser results for particularly troublesome users, beyond the database retention period.
If they're doing that, then they need to be careful to make sure they are still following the privacy policy.
How exactly does the privacy policy even apply?
2008/9/20 Anthony wikimail@inbox.org:
On Sat, Sep 20, 2008 at 7:37 AM, Thomas Dalton thomas.dalton@gmail.comwrote:
Troll hunters can and should retain CheckUser results for particularly troublesome users, beyond the database retention period.
If they're doing that, then they need to be careful to make sure they are still following the privacy policy.
How exactly does the privacy policy even apply?
The privacy policy dictates how private information will be handled, how could it not apply?
On Sat, Sep 20, 2008 at 8:43 AM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/20 Anthony wikimail@inbox.org:
On Sat, Sep 20, 2008 at 7:37 AM, Thomas Dalton <thomas.dalton@gmail.com wrote:
Troll hunters can and should retain CheckUser results for particularly troublesome users, beyond the database retention period.
If they're doing that, then they need to be careful to make sure they are still following the privacy policy.
How exactly does the privacy policy even apply?
The privacy policy dictates how private information will be handled, how could it not apply?
I didn't ask if it I applied, I asked how it applied. Maybe I should have said "in what way" instead of "how"?
One question would be, if a "troll hunter" breaks the privacy policy, who would you sue? The "troll hunter"? The WMF? Surely the public can't sue the "troll hunter" directly, as the privacy policy is only binding on the WMF. But surely the WMF would argue that it isn't responsible for the actions of its CheckUsers if it did get sued.
Another question would be, what restrictions are there on retaining CheckUser results? I don't see any.
I didn't ask if it I applied, I asked how it applied. Maybe I should have said "in what way" instead of "how"?
One question would be, if a "troll hunter" breaks the privacy policy, who would you sue? The "troll hunter"? The WMF? Surely the public can't sue the "troll hunter" directly, as the privacy policy is only binding on the WMF. But surely the WMF would argue that it isn't responsible for the actions of its CheckUsers if it did get sued.
I don't know, ask a lawyer. The privacy policy certainly applies to checkusers, though, otherwise we wouldn't have an ombudsman to deal with complaints of checkusers violating it.
Another question would be, what restrictions are there on retaining CheckUser results? I don't see any.
There are restrictions on who the information can be distributed to, I'm not sure on the exact definition of "distribute" but letting other people use your computer when the information is stored on it may qualify (the policy warns that anyone with access to the servers can access the information, it doesn't warn that anyone with access to a checkuser's personal computer can access it). There may also be issues where the WMF no longer has the information so people start subpoenaing individual checkusers - would they then be bound by the WMF's commitments regarding subpoenas? (They are only in the draft policy, not the current one, but I'm sure the draft one will become current sooner or later.) I don't know how it would all work, but it's something that needs to be considered carefully. I would advise against any checkuser storing such data without having discussed it with Mike.
On Sat, Sep 20, 2008 at 9:15 AM, Thomas Dalton thomas.dalton@gmail.comwrote:
I didn't ask if it I applied, I asked how it applied. Maybe I should
have
said "in what way" instead of "how"?
One question would be, if a "troll hunter" breaks the privacy policy, who would you sue? The "troll hunter"? The WMF? Surely the public can't
sue
the "troll hunter" directly, as the privacy policy is only binding on the WMF. But surely the WMF would argue that it isn't responsible for the actions of its CheckUsers if it did get sued.
I don't know, ask a lawyer.
If I ever feel I am significantly damaged by a violation of the privacy policy, I will. Until then, I'll feel free speculating. My speculation is that the proper route would be to sue the WMF claiming that either 1) they are responsible for the actions of the CheckUser, OR 2) that they have violated the clause prohibiting sharing of information with third parties.
Of course, the problem is, the privacy policy is so enormously open-ended that just about nothing anyone does is definitively a violation of it. Mike Godwin has worked hard to ensure that the WMF is never legally responsible for anything, and the community has for the most part applauded this as a good thing.
The privacy policy certainly applies to
checkusers, though, otherwise we wouldn't have an ombudsman to deal with complaints of checkusers violating it.
I'm not questioning whether or not it "applies", though. I'm questioning the manner in which it applies.
Another question would be, what restrictions are there on retaining CheckUser results? I don't see any.
There are restrictions on who the information can be distributed to,
There are restrictions on "public distribution" and there are restrictions on how "Wikimedia" can "share private information". The "public distribution" restrictions probably don't apply to information about "particularly troublesome users", and the restriction on sharing of private information only applies to sharing by "Wikimedia".
So it's hard for me to see how a CheckUser could violate the privacy policy in the first place (short of going completely crazy and publishing random results on her website), and even if they did, there's probably nothing anyone can do about it except take away their CheckUser rights for the future (which does nothing to take away the data they've already stored - Kelly Martin, for instance, has admitted to having a copies of CheckUser results, and has also admitted to going on fishing expeditions CheckUsering IP addresses which show up in email headers).
There may also be issues where the WMF no longer has the information so people start subpoenaing individual checkusers - would they then be bound by the WMF's commitments regarding subpoenas? (They are only in the draft policy, not the current one, but I'm sure the draft one will become current sooner or later.) I don't know how it would all work, but it's something that needs to be considered carefully. I would advise against any checkuser storing such data without having discussed it with Mike.
Why should they heed your advice, though? Just so they can help someone they consider to be a "particularly troublesome user" escape legal consequences for her actions?
I would advise against making it so easy for so many people, most of whom have undergone no background check and many of whom have nothing to lose, to store such data in the first place. Are there really that many people who need immediate unfettered access to actual IP addresses from home?
I'd also advise anyone accessing a website run by Wikimedia to assume that their every action is being recorded and might be shared with anyone for any reason.
If I ever feel I am significantly damaged by a violation of the privacy policy, I will. Until then, I'll feel free speculating. My speculation is that the proper route would be to sue the WMF claiming that either 1) they are responsible for the actions of the CheckUser, OR 2) that they have violated the clause prohibiting sharing of information with third parties.
That would be my guess, as well.
Of course, the problem is, the privacy policy is so enormously open-ended that just about nothing anyone does is definitively a violation of it. Mike Godwin has worked hard to ensure that the WMF is never legally responsible for anything, and the community has for the most part applauded this as a good thing.
True.
Why should they heed your advice, though?
Because they think it's good advice. If they don't think it's good advice, then they should feel free to ignore it.
Just so they can help someone they consider to be a "particularly troublesome user" escape legal consequences for her actions?
Legal consequences? I don't understand... the result of being caught by a checkuser is usually a block, I wouldn't call that a legal consequence.
I would advise against making it so easy for so many people, most of whom have undergone no background check and many of whom have nothing to lose, to store such data in the first place. Are there really that many people who need immediate unfettered access to actual IP addresses from home?
There aren't many checkusers, are there? I don't think the WMF could reasonably handle all requests for checkuser, so what would you alternative be?
I'd also advise anyone accessing a website run by Wikimedia to assume that their every action is being recorded and might be shared with anyone for any reason.
That's the assumption you should make about any site in the absence of a privacy policy. Wikimedia has a privacy policy that clearly states the reasons they'll reveal information, should people really assume the WMF won't follow their own policy?
Just so they can help someone they consider to be a "particularly troublesome user" escape legal consequences for her actions?
Legal consequences? I don't understand... the result of being caught by a checkuser is usually a block, I wouldn't call that a legal consequence.
No, you were talking about subpoenas...I guess I misread you.
I would advise against making it so easy for so many people, most of whom have undergone no background check and many of whom have nothing to lose,
to
store such data in the first place. Are there really that many people
who
need immediate unfettered access to actual IP addresses from home?
There aren't many checkusers, are there? I don't think the WMF could reasonably handle all requests for checkuser, so what would you alternative be?
I don't see why checkuser can't be watered down, to simply provide a "YES/NO" answer, and that two or three WMF employees can have access to the actual IP addresses (with all but one of those employees only having it for backup/emergency purposes). Unlike in the past, blocking of a username automatically blocks the IP address, even if the user doesn't try to make any edits, so isn't checkuser basically just for investigations of ongoing abuse which can wait a few days for a WMF person to get around to it?
I'd also advise anyone accessing a website run by Wikimedia to assume
that
their every action is being recorded and might be shared with anyone for
any
reason.
That's the assumption you should make about any site in the absence of a privacy policy. Wikimedia has a privacy policy that clearly states the reasons they'll reveal information, should people really assume the WMF won't follow their own policy?
Wikimedia has a privacy policy that states the conditions under which they'll "permit public distribution" of private information. But what about private distribution? They say they won't share private information with "third parties" without permission or as required by law. But either they openly violate that provision, or they have defined "third parties" so narrowly as to be meaningless.
Anthony, you need to review the workings of a checkuser. I am not a CU But I know for a fact that is very very easy to avoid auto blocks and using same IPs as an origional account. all you have to do is change your IP. the thing is most IPs are acllocated on /24 or /16 groupings sometimes you get smaller goupings but you can say xxx.xxx.xxx.0 through xxx.xxx.xxx.25 are the same provider and most of the time the same regional area. (city/ region ) that is used as part of the proof checkusers use to prove socking along with other useragents. socking is more than is this just the same IP?. many many users use shared IPs but that does not make them socks. without a full investigaion having a list of all recent IPs used by a user, user-agents, edit times, and edit contents are all part of the complex art of proving two accounts are the same human. it also allows for detecting open proxies and indef blocking them instead of just allowing autoblock. I would recommend everyone who has no clue about checkuser process to shut up and stop putting their mouth where it does not belong because most users are clueless about the complexity of checkuser information and what needs done to conclusively prove two accounts are used by the same user.
Betacommand
On Sat, Sep 20, 2008 at 1:33 PM, Betacommand Betacommand@gmail.com wrote:
Anthony, you need to review the workings of a checkuser. I am not a CU But I know for a fact that is very very easy to avoid auto blocks and using same IPs as an origional account. all you have to do is change your IP.
Huh? My block log isn't quite as extensive as yours, so maybe you're more familiar with autoblocks than I am, but your comment doesn't make any sense. You can use the same IP, as long as you change your IP?
I would recommend everyone who has no clue about checkuser process to shut up and stop putting their mouth where it does not belong because most users are clueless about the complexity of checkuser information and what needs done to conclusively prove two accounts are used by the same user.
I'm fairly well aware of the complexity of checkuser information and what needs to be done to conclusively prove two accounts are used by the same person. What I'm not aware of is why this is so important in the first place.
On Sat, Sep 20, 2008 at 11:06 AM, Anthony wikimail@inbox.org wrote:
On Sat, Sep 20, 2008 at 1:33 PM, Betacommand Betacommand@gmail.com wrote:
Anthony, you need to review the workings of a checkuser. I am not a CU But I know for a fact that is very very easy to avoid auto blocks and using same IPs as an origional account. all you have to do is change your IP.
Huh? My block log isn't quite as extensive as yours, so maybe you're more familiar with autoblocks than I am...
OOOOOOHHHHHHH SNAP! --~~~
Anthony, you also forget that I am a former admin, and have 80k+ edits to en wiki alone. you have no fucking clue what your talking about. Autoblocks have been proven very very ineffective. all one has to do to avoid an auto block is either re-set their router or hang up a dial up connection and re-connect. why do you think we have range blocks? they are key parts in stopping prolific vandals and sock puppets. instead of everyone complaining about straw men ask the checkusers how much info they need to do what they need to do.
Well, that was certainly uncalled for.
On Sun, Sep 21, 2008 at 8:24 AM, John Doe phoenixoverride@gmail.com wrote:
Anthony, you also forget that I am a former admin, and have 80k+ edits to en wiki alone. you have no fucking clue what your talking about. Autoblocks have been proven very very ineffective. all one has to do to avoid an auto block is either re-set their router or hang up a dial up connection and re-connect. why do you think we have range blocks? they are key parts in stopping prolific vandals and sock puppets. instead of everyone complaining about straw men ask the checkusers how much info they need to do what they need to do.
Anthony the prior post was me and was completely called for. you are attempting to dictate policy in an area where you have no clue what your doing, If I, blunt so be it, someone needs to say it how it is
The tone was uncalled for and is not going to be welcome here. The information itself was good. You need to figure out either how share information in a more acceptable tone or how leave it to others to correct any misconceptions.
Birgitte SB
--- On Sun, 9/21/08, Betacommand Betacommand@gmail.com wrote:
From: Betacommand Betacommand@gmail.com Subject: Re: [Foundation-l] Data retention To: "Wikimedia Foundation Mailing List" foundation-l@lists.wikimedia.org Date: Sunday, September 21, 2008, 7:36 AM Anthony the prior post was me and was completely called for. you are attempting to dictate policy in an area where you have no clue what your doing, If I, blunt so be it, someone needs to say it how it is _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
What information was good? What information was there?
The tone was uncalled for and is not going to be welcome here. The
information itself was good. You need to figure out either how share information in a more acceptable tone or how leave it to others to correct any misconceptions.
Anthony the prior post was me and was completely called for. you are attempting to dictate policy in an area where you have no clue what your doing, If I, blunt so be it, someone needs to say it how it is
Enough is enough, from both sides.
The initial question was asked, and subsequently answered. If anyone want to actually propose a change to the data retention policy, there were ample opportunity for comments when it came up in May this year. However, I'm sure the Board would be happy to listen to any suggestions.
If anyone want to suggest changes to the CheckUser policy, how it works, and what information checkusers should have access to etc., feel free to propose it on meta.
KTC
On Sun, Sep 21, 2008 at 1:24 PM, John Doe phoenixoverride@gmail.com wrote:
Anthony, you also forget that I am a former admin...
Well by reading the post, I can kind of see why.
--- On Sun, 9/21/08, Al Tally majorly.wiki@googlemail.com wrote:
From: Al Tally majorly.wiki@googlemail.com Subject: Re: [Foundation-l] Data retention To: "Wikimedia Foundation Mailing List" foundation-l@lists.wikimedia.org Date: Sunday, September 21, 2008, 7:46 AM On Sun, Sep 21, 2008 at 1:24 PM, John Doe phoenixoverride@gmail.com wrote:
Anthony, you also forget that I am a former admin...
Well by reading the post, I can kind of see why.
And fanning the flame like this is just as unhelpful as the tone of the original message. If you have must share a personal remark like this, please send it off-list. No-one wants this list become an audience to perpetual sniping.
Birgitte SB
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Betacommand wrote:
Anthony, you need to review the workings of a checkuser. I am not a CU But I know for a fact that is very very easy to avoid auto blocks and using same IPs as an origional account. all you have to do is change your IP. the thing is most IPs are acllocated on /24 or /16 groupings sometimes you get smaller goupings but you can say xxx.xxx.xxx.0 through xxx.xxx.xxx.25 are the same provider and most of the time the same regional area. (city/ region ) that is used as part of the proof checkusers use to prove socking along with other useragents. socking is more than is this just the same IP?. many many users use shared IPs but that does not make them socks. without a full investigaion having a list of all recent IPs used by a user, user-agents, edit times, and edit contents are all part of the complex art of proving two accounts are the same human. it also allows for detecting open proxies and indef blocking them instead of just allowing autoblock. I would recommend everyone who has no clue about checkuser process to shut up and stop putting their mouth where it does not belong because most users are clueless about the complexity of checkuser information and what needs done to conclusively prove two accounts are used by the same user.
Betacommand _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
What they way you word things, don't be offensive.
- -- Best, Jon
[User:NonvocalScream]
Anthony wrote:
I don't see why checkuser can't be watered down, to simply provide a "YES/NO" answer, and that two or three WMF employees can have access to the actual IP addresses (with all but one of those employees only having it for backup/emergency purposes). Unlike in the past, blocking of a username automatically blocks the IP address, even if the user doesn't try to make any edits, so isn't checkuser basically just for investigations of ongoing abuse which can wait a few days for a WMF person to get around to it?
It would only work in the most obvious of sock puppetry cases, and could still be wrong. If the person doing the check can't see the IP, they have no way of knowing if its an open proxy or Tor node, a dynamic IP, a shared IP, a static IP, etc. They just know that the users are using the same IP (which is presumably what would trigger a YES). What if the users were in the same range of addresses? What if they have the same user-agent or XFF but different IPs? What if they are on the same IP but with a different user-agent or XFF? Or a different, but similar user-agent?
What if we can't wait a few days to get the IP address, say a user using proxies or dynamic IPs to use avoid autoblocks, see Spacebirdy's userrights/globalblock/global account logs on meta for an example, an autoblock would only affect one wiki, then they go to another one to create more accounts. Or what if they have other accounts that they registered, but haven't used yet?
On Sat, Sep 20, 2008 at 3:28 PM, Alex mrzmanwiki@gmail.com wrote:
Anthony wrote:
I don't see why checkuser can't be watered down, to simply provide a "YES/NO" answer, and that two or three WMF employees can have access to
the
actual IP addresses (with all but one of those employees only having it
for
backup/emergency purposes). Unlike in the past, blocking of a username automatically blocks the IP address, even if the user doesn't try to make any edits, so isn't checkuser basically just for investigations of
ongoing
abuse which can wait a few days for a WMF person to get around to it?
It would only work in the most obvious of sock puppetry cases, and could still be wrong.
At least it'd be a quick first check to spare the WMF person from handling those obvious cases. But see below...
What if we can't wait a few days to get the IP address, say a user using
proxies or dynamic IPs to use avoid autoblocks, see Spacebirdy's userrights/globalblock/global account logs on meta for an example, an autoblock would only affect one wiki, then they go to another one to create more accounts. Or what if they have other accounts that they registered, but haven't used yet?
Most of those problems could be solved without revealing the actual IP address (you don't need to know an IP address to be able to block it), but I'm going to concede that I haven't thought through all the possible ramifications to be sure that there wouldn't be any holes left open.
I plead no contest. Thanks for pointing out the naiveté of my simple solution.
2008/9/20 Anthony wikimail@inbox.org:
Just so they can help someone they consider to be a "particularly troublesome user" escape legal consequences for her actions?
Legal consequences? I don't understand... the result of being caught by a checkuser is usually a block, I wouldn't call that a legal consequence.
No, you were talking about subpoenas...I guess I misread you.
Ah, I see! My misunderstanding there, sorry. Firstly, I would assume anyone having subpoenas issued about that is innocent, since they haven't been convicted yet. Secondly, I have no problem with checkusers complying with subpoenas, but the new privacy policy commits the WMF to trying to notify the subject and to hold off on complying if the subpoena is challenged in court - would individual checkusers do the same?
wikimedia-l@lists.wikimedia.org