Would it relieve some of the concerns if we limited
publishing of subnational data to particularly large countries, like the United States,
and particularly large projects, like the English Wikipedia?
The size of the project is irrelevant. Even on wp:en it would be rather trivial to find
the geo data for any very active editor, by matching timestamps in the squid log with
timestamps in the dump or recent changes list. Of course we don't publish squid logs.
But let us assess risk when data do leak or are exposed otherwise. Then it is important
those geo data are *sufficiently non-specific*. For me that's the issue we should
focus on.
--
The city names which MaxMind keeps track of is a limited list (
http://www.maxmind.com/GeoIPCity-534-Location.csv ) Of course it may expand.
We would store it locally like we do with country and continent lookup list, and could
manually vet whether cities are > say 100,000 people)
So we could build a white list from it which expands over time. Of course that would be
another lookup.
As for latitude/longitude, again, these should be rounded on purpose.
If we round on 0.5 degree, this gives a latitudinal resolution of around 55 km or 30 mi
at the equator, and 22 km or 12 mile at the arctic circle.
(Again state or region lookup might be too costly to lookup anyway, but that is another
matter)
Erik
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of James Hare
Sent: Wednesday, August 14, 2013 12:13 AM
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] U.S. state-level editor retention data
On Aug 13, 2013, at 6:06 PM, Luis Villa <lvilla(a)wikimedia.org> wrote:
On Tue, Aug 13, 2013 at 1:45 PM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote:
And we already have some aggregated data about editors on
stats.wikimedia.org squid
repots, so it's surely not a privacy issue.
I'd be worried about using aggregation as a cureall, when, as others have pointed out,
we have some very small wikis. But it can be done, especially when you check to make sure
that (at whatever granularity you use for the geodata and timestamps) the resulting
aggregated sets are always reasonably large.
Luis
Would it relieve some of the concerns if we limited publishing of subnational data to
particularly large countries, like the United States, and particularly large projects,
like the English Wikipedia?
James
--
Luis Villa
Deputy General Counsel
Wikimedia Foundation
415.839.6885 ext. 6810
NOTICE: This message may be confidential or legally privileged. If you have received it by
accident, please delete it and let us know about the mistake. As an attorney for the
Wikimedia Foundation, for legal/ethical reasons I cannot give legal advice to, or serve as
a lawyer for, community members, volunteers, or staff members in their personal capacity.
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics