Realistically, if a government in a country that hosts one of the WMF data centers decides that they want unfiltered access to the data, I'm not sure how much WMF could do about it. I won't speculate on what kind of defenses WMF might have against that scenario, but I would encourage Analytics, Legal, and Security to have that conversation if they have not already done so. (The US government is not the only government that might engage in this kind of mass surveillance, and such a government may or may not use legal means to accomplish their objectives; other options include various kinds of phishing and social engineering attacks.)

Returning to previous discussions about limiting the number of people who have access to raw IPs and related data, I'm thinking that I like the idea of hashing the data and/or geolocating the data and then giving that processed data to researchers, rather than letting researchers have the raw data. I would be more comfortable with people who are not WMF employees and not community checkusers having access to the processed data than to true IP addresses, UAs, and other similar kinds of data.

Pine

On Fri, Nov 11, 2016 at 1:58 PM, C. Scott Ananian <cananian@wikimedia.org> wrote:

On Fri, Nov 11, 2016 at 2:16 PM, Leila Zia <leila@wikimedia.org> wrote:
* Subpoena related concerns: the best way to handle this from the data storage perspective is to not have the data at all. That is why very sensitive data is purged after 60 days at the moment in webrequest logs. As Nuria said, this length of time may be shortened by a little, but at least because of operational constraints, we won't be able to not store this data at all.

It is worth considering this in context of https://twitter.com/Pinboard/status/797167026481442816

That is, not storing the data is nice, but do we have any plans in place in case a government decides to place a recording device in our data center beside our servers? We may have the best of intentions, but "we don't store it" could in fact be misleading comfort if there is a third-party who *is* storing it.

This is perhaps a broader question (and more in line with James' initial inquiry?), as it suggests that we reconsider what sort of protections we can actually provide to our editors, and make sure they know if we can't protect them from state-level monitoring.
--scott
--
(http://cscott.net)

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics