Gotcha. Reading that proposal it appears to be a proposal for a
methodology that will enable future proposals; where are the future
proposals? It also says "in many countries, disease monitoring must be
carried out at the state or metro-area level" - which countries have
to be metro-level? Who are we risking the entire reader population
for, here? Is it one country, or ten, or?
For what it's worth I love the idea of this kind of live stream. But I
want to make sure that how the various chunks are being prioritised,
and how critical they are to the outside world, is correlated - and is
correlated with the underlying data's sensitivity, at that. If we're
introducing risks by going down to city level and the actual use cases
for city level data are limited, let's not do that - but this proposal
doesn't provide thoughts on how limited those use cases are. It just
says that it's required in some countries.
On 5 June 2015 at 09:35, Dan Andreescu <dandreescu(a)wikimedia.org> wrote:
My only
thought is that "city" makes me uncomfortable. Did we track
down a precise use case for that in the end?
Yes, the Los Alamos National Lab folks' proposal:
https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pagev…
We talked to them yesterday and it seems the time granularity is not as
important. That's why that dataset is *daily* and the other one is
*hourly*. Either way, these will be k-anonymized at any level. Once we
have some data up, though, I'd love for people who are good at this to try
and attack the datasets in combination and from different points of view
like t-closeness, etc. I don't want to leak any info and any help on that
is appreciated 'cause it's a hard problem.
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Research Analyst
Wikimedia Foundation