Hi folks,
Reviving an old thread (my apologies for the delay). I’ve looked over this thread, the talk page linked below, and a few other places that seemed like they might have feedback for us.
It seemed to me that key feedback, in addition to some technical suggestions, was:
- Ratio of logged in to logged out readers can be inferred.
- Think more carefully about whether reading patterns can be inferred for anonymous editors.
- How to interpret the Do-Not-Track header is controversial.
As for DNT, my main concern from the research perspective is, would interpreting DNT as exclusion from geo-aggregation reduce the sample size excessively. Luis Villa’s link for Firefox numbers shows a peak of 11% in March 2013, declining to 8%
at the end of the data in September 2014, for desktop version, with a 17% peak in July 2012 and a similar decline to 5% in September 2014 for mobile users. With these types of numbers, I believe the larger sample (i.e., DNT hits included in geo-aggregation)
will indeed support somewhat more robust results, but the smaller sample (exclude DNT) is fine. I worry some about growth, but as long as it’s not the default, that’s probably not a major concern.
One thing that I would really like feedback on is: what is an acceptable
k — i.e., how large is the set of users from whom a specific user is indistinguishable? I believe this will have a significantly greater impact on the quality of our results than DNT.
Please let me know if I’ve missed anything. I’d like to rev the proposal soon, and I’d like to make it responsive to what the community thinks.
Thanks,
Reid
[Just to be absolutely clear, I’m speaking for myself, not my employer.]