[Foundation-l] Data retention

Tim Starling tstarling at wikimedia.org
Fri Sep 19 00:33:36 UTC 2008


Multiple replies below.

Charlotte Webb wrote:
> But I can say that the numbers game becomes less laughable on smaller
> projects. Let's take the [[Hungarian Wikinews]] for example, which had
> only 374 edits in an equal time-span[5][6].
> 
> So on this project there would be a 2.331 percent chance of
> over-retaining checkuser data in violation of the privacy policy[7].

Keeping data for more than 6 months does not violate the privacy policy.

Thomas Dalton wrote:
> 0.99^6500000=5.819478586x10^-28372

Note that if you don't have an arbitrary precision calculator handy:

0.99^6500000 = 10^(6500000 * log_10(0.99)) ~= 10^-28371.

Anthony wrote:
> I thought the checkuser data was moved out of the recentchanges table.

Yes, over a year ago:
http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=21016

But we're not talking about the recentchanges table. The cu_changes table
is purged in exactly the same way:

    # Every 100th edit, prune the checkuser changes table.
    if( 0 == mt_rand( 0, 99 ) ) {
        # Periodically flush old entries from the recentchanges table.
        global $wgCUDMaxAge;
        $cutoff = $dbw->timestamp( time() - $wgCUDMaxAge );
        $recentchanges = $dbw->tableName( 'cu_changes' );
        $sql = "DELETE FROM $recentchanges WHERE cuc_timestamp < '{$cutoff}'";
        $dbw->query( $sql );
    }

In fact, it's so similar, that whoever added that forgot to rename a
variable and change a comment when they copied it from the recentchanges code.

Joe Szilagyi wrote:
> That is what has been said around the chatter lines. Was this documented in
> the SVN somewhere if so, and approved? For all Wikis? Just some?

Are you implying that this change could somehow be controversial? If so,
can you explain how that might be?

-- Tim Starling





More information about the wikimedia-l mailing list