> Would it relieve some of the concerns if we limited publishing of subnational data to particularly large countries, like the United States, and particularly large projects, like the English Wikipedia?

 

The size of the project is irrelevant. Even on wp:en it would be rather trivial to find the geo data for any very active editor, by matching timestamps in the squid log with timestamps in the dump or recent changes list. Of course we don't publish squid logs. But let us assess risk when data do leak or are exposed otherwise. Then it is important those geo data are *sufficiently non-specific*. For me that's the issue we should focus on.

 

--

 

The city names which MaxMind keeps track of is a limited list ( http://www.maxmind.com/GeoIPCity-534-Location.csv ) Of course it may expand.

We would store it locally like we do with country and continent lookup list, and could manually vet whether cities are > say 100,000 people)

So we could build a white list from it which expands over time. Of course that would be another lookup.

 

As for latitude/longitude, again, these should be rounded on purpose.

If we round on  0.5 degree, this gives a latitudinal resolution of around 55 km or 30 mi at the equator, and 22 km or 12 mile at the arctic circle.

 

(Again state or region lookup might be too costly to lookup anyway, but that is another matter)

 

Erik

 

 

 

From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of James Hare
Sent: Wednesday, August 14, 2013 12:13 AM
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: Re: [Analytics] U.S. state-level editor retention data

 

On Aug 13, 2013, at 6:06 PM, Luis Villa <lvilla@wikimedia.org> wrote:

 

 

On Tue, Aug 13, 2013 at 1:45 PM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:


And we already have some aggregated data about editors on stats.wikimedia.org squid repots, so it's surely not a privacy issue.

 

I'd be worried about using aggregation as a cureall, when, as others have pointed out, we have some very small wikis. But it can be done, especially when you check to make sure that (at whatever granularity you use for the geodata and timestamps) the resulting aggregated sets are always reasonably large.

Luis

 

Would it relieve some of the concerns if we limited publishing of subnational data to particularly large countries, like the United States, and particularly large projects, like the English Wikipedia?

 

 

James

 





--

Luis Villa
Deputy General Counsel
Wikimedia Foundation
415.839.6885 ext. 6810

 

NOTICE: This message may be confidential or legally privileged. If you have received it by accident, please delete it and let us know about the mistake. As an attorney for the Wikimedia Foundation, for legal/ethical reasons I cannot give legal advice to, or serve as a lawyer for, community members, volunteers, or staff members in their personal capacity.

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics