Andrew,

I think it is reasonable to assume that the "Do not track" header isn't referring to this.

From http://donottrack.us/ with emphasis added.

Do Not Track is a technology and policy proposal that enables users to opt out of tracking by websites they do not visit, [...]

Do not track is explicitly for third party tracking. We are merely proposing to count those people who do access our sites. Note that, in this case, we are not interested in obtaining identifiers at all, so the word "track" seems to not apply.

It seems like we're looking for something like a "Do Not Log Anything At All" header. I don't believe that such a thing exists -- but if it did I think it would be good if we supported it.

-Aaron

On Tue, Jan 13, 2015 at 2:03 PM, Andrew Gray <andrew.gray@dunelm.org.uk> wrote:

Hi Dario, Reid,

This seems sensible enough and proposal #3 is clearly the better
approach. An explicit opt-in opt-out mechanism would not be worth the
effort to build and would become yet another ignored preferences
setting after a few weeks...

A couple of thoughts:

* I understand the reasoning for not using do-not-track headers (#4);
however, it feels a bit odd to say "they probably don't mean us" and
skip them... I can almost guarantee you'll have at least one person
making a vocal fuss about not being able to opt-out without an
account. If we were to honour these headers, would it make a
significant change to the amount of data available? Would it likely
skew it any more than leaving off logged-in users?

* Option 3 does releases one further piece of information over and
above those listed - an approximate ratio of logged in versus
non-logged-in pageviews for a page. I cannot see any particular
problem with doing this (and I can think of a couple of fun things to
use it for) but it's probably worth being aware.

Andrew.

On 13 January 2015 at 07:26, Dario Taraborelli

<dtaraborelli@wikimedia.org> wrote:
> I’m sharing a proposal that Reid Priedhorsky and his collaborators at Los Alamos National Laboratory recently submitted to the Wikimedia Analytics Team aimed at producing privacy-preserving geo-aggregates of Wikipedia pageview data dumps and making them available to the public and the research community. [1]
>
> Reid and his team spearheaded the use of the public Wikipedia pageview dumps to monitor and forecast the spread of influenza and other diseases, using language as a proxy for location. This proposal describes an aggregation strategy adding a geographical dimension to the existing dumps.
>
> Feedback on the proposal is welcome on the lists or the project talk page on Meta [3]
>
> Dario
>
> [1] https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
> [2] http://dx.doi.org/10.1371/journal.pcbi.1003892
> [3] https://meta.wikimedia.org/wiki/Research_talk:Geo-aggregation_of_Wikipedia_pageviews
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

--
- Andrew Gray
andrew.gray@dunelm.org.uk

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l