Hi Dario, Reid,
This seems sensible enough and proposal #3 is clearly the better
approach. An explicit opt-in opt-out mechanism would not be worth the
effort to build and would become yet another ignored preferences
setting after a few weeks...
A couple of thoughts:
* I understand the reasoning for not using do-not-track headers (#4);
however, it feels a bit odd to say "they probably don't mean us" and
skip them... I can almost guarantee you'll have at least one person
making a vocal fuss about not being able to opt-out without an
account. If we were to honour these headers, would it make a
significant change to the amount of data available? Would it likely
skew it any more than leaving off logged-in users?
* Option 3 does releases one further piece of information over and
above those listed - an approximate ratio of logged in versus
non-logged-in pageviews for a page. I cannot see any particular
problem with doing this (and I can think of a couple of fun things to
use it for) but it's probably worth being aware.
On 13 January 2015 at 07:26, Dario Taraborelli
I’m sharing a proposal that Reid Priedhorsky and his
collaborators at Los Alamos National Laboratory recently submitted to the Wikimedia
Analytics Team aimed at producing privacy-preserving geo-aggregates of Wikipedia pageview
data dumps and making them available to the public and the research community. 
Reid and his team spearheaded the use of the public Wikipedia pageview dumps to monitor
and forecast the spread of influenza and other diseases, using language as a proxy for
location. This proposal describes an aggregation strategy adding a geographical dimension
to the existing dumps.
Feedback on the proposal is welcome on the lists or the project talk page on Meta 
Analytics mailing list
- Andrew Gray