[Wikimedia-l] PRISM

MZMcBride z at mzmcbride.com
Mon Jun 10 23:18:30 UTC 2013

David Gerard wrote:
>On 10 June 2013 18:01, Rand McRanderson <therandshow at gmail.com> wrote:
>> I think the key here is not to keep more information about users than
>> necessary.
>In particular - at present. as I understand it, we don't keep full
>access logs, just 1/1000 samples.
>We need to not keep full access logs.

I'm not sure about access log retention. I know what used to be true (that
we didn't and frankly couldn't keep full access logs), but I'm not sure
what the current situation is.

Related to this, however, is a broader point about hiding versus deleting
information. We, as a community, have gotten into a pattern of hiding
(suppressing) information in our databases rather than simply removing it
outright. This has advantages (chiefly reversibility), but the practice of
sweeping information under the rug rather than taking out the trash can,
and inevitably will, cause issues. Truly problematic usernames, edits, and
logs really ought to be deleted, not simply suppressed, in my opinion.

This has come up in the context of database dumps and database
replication. We're basically asking for this information to one day be
leaked by retaining it indefinitely (including usernames that out
individuals, CheckUser logs, content buried inside page histories, etc.).


