Re: [Analytics] ensuring reader anonymity

12 Nov 2016


      Realistically, if a government in a country that hosts one of the WMF data
centers decides that they want unfiltered access to the data, I'm not sure
how much WMF could do about it. I won't speculate on what kind of defenses
WMF might have against that scenario, but I would encourage Analytics,
Legal, and Security to have that conversation if they have not already done
so. (The US government is not the only government that might engage in this
kind of mass surveillance, and such a government may or may not use legal
means to accomplish their objectives; other options include various kinds
of phishing and social engineering attacks.)
Returning to previous discussions about limiting the number of people who
have access to raw IPs and related data, I'm thinking that I like the idea
of hashing the data and/or geolocating the data and then giving that
processed data to researchers, rather than letting researchers have the raw
data. I would be more comfortable with people who are not WMF employees and
not community checkusers having access to the processed data than to true
IP addresses, UAs, and other similar kinds of data.
Pine
On Fri, Nov 11, 2016 at 1:58 PM, C. Scott Ananian cananian@wikimedia.org
wrote:
...
On Fri, Nov 11, 2016 at 2:16 PM, Leila Zia leila@wikimedia.org wrote:
...

Subpoena related concerns: the best way to handle this from the data

storage perspective is to not have the data at all. That is why very
sensitive data is purged after 60 days at the moment in webrequest logs. As
Nuria said, this length of time may be shortened by a little, but at least
because of operational constraints, we won't be able to not store this data
at all.
It is worth considering this in context of https://twitter.com/
Pinboard/status/797167026481442816
That is, not storing the data is nice, but do we have any plans in place
in case a government decides to place a recording device in our data center
beside our servers?  We may have the best of intentions, but "we don't
store it" could in fact be misleading comfort if there is a third-party who
*is* storing it.
This is perhaps a broader question (and more in line with James' initial
inquiry?), as it suggests that we reconsider what sort of protections we
can actually provide to our editors, and make sure they know if we can't
protect them from state-level monitoring.
--scott
(http://cscott.net)

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] ensuring reader anonymity

--scott