Ratio of logged in to logged out readers can be inferred.
Think more carefully about whether reading patterns can be inferred for anonymous editors.
How to interpret the Do-Not-Track header is controversial.

As for DNT, my main concern from the research perspective is, would interpreting DNT as exclusion from geo-aggregation reduce the sample size excessively. Luis Villa’s link for Firefox numbers shows a peak of 11% in March 2013, declining to 8% at the end of the data in September 2014, for desktop version, with a 17% peak in July 2012 and a similar decline to 5% in September 2014 for mobile users. With these types of numbers, I believe the larger sample (i.e., DNT hits included in geo-aggregation) will indeed support somewhat more robust results, but the smaller sample (exclude DNT) is fine. I worry some about growth, but as long as it’s not the default, that’s probably not a major concern.

One thing that I would really like feedback on is: what is an acceptable k — i.e., how large is the set of users from whom a specific user is indistinguishable? I believe this will have a significantly greater impact on the quality of our results than DNT.

On 13 January 2015 at 07:26, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:

I’m sharing a proposal that Reid Priedhorsky and his collaborators at Los Alamos National Laboratory recently submitted to the Wikimedia Analytics Team aimed at producing privacy-preserving geo-aggregates of Wikipedia pageview data dumps and making them available to the public and the research community. [1]

Reid and his team spearheaded the use of the public Wikipedia pageview dumps to monitor and forecast the spread of influenza and other diseases, using language as a proxy for location. This proposal describes an aggregation strategy adding a geographical dimension to the existing dumps.

Feedback on the proposal is welcome on the lists or the project talk page on Meta [3]

Dario

[1] https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
[2] http://dx.doi.org/10.1371/journal.pcbi.1003892
[3] https://meta.wikimedia.org/wiki/Research_talk:Geo-aggregation_of_Wikipedia_pageviews