On Tue, Jan 13, 2015 at 3:24 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
I think it is reasonable to assume that the "Do not track" header isn't
referring to this.
with emphasis added.
Do Not Track is a technology and policy proposal
that enables users to
opt out of *tracking by websites they do not visit*, [...]
Do not track is explicitly for third party tracking. We are merely
proposing to count those people who do access our sites. Note that, in
this case, we are not interested in obtaining identifiers at all, so the
word "track" seems to not apply.
It seems like we're looking for something like a "Do Not Log Anything At
All" header. I don't believe that such a thing exists -- but if it did I
think it would be good if we supported it.
On Tue, Jan 13, 2015 at 2:03 PM, Andrew Gray <andrew.gray(a)dunelm.org.uk>
Hi Dario, Reid,
This seems sensible enough and proposal #3 is clearly the better
approach. An explicit opt-in opt-out mechanism would not be worth the
effort to build and would become yet another ignored preferences
setting after a few weeks...
A couple of thoughts:
* I understand the reasoning for not using do-not-track headers (#4);
however, it feels a bit odd to say "they probably don't mean us" and
skip them... I can almost guarantee you'll have at least one person
making a vocal fuss about not being able to opt-out without an
account. If we were to honour these headers, would it make a
significant change to the amount of data available? Would it likely
skew it any more than leaving off logged-in users?
* Option 3 does releases one further piece of information over and
above those listed - an approximate ratio of logged in versus
non-logged-in pageviews for a page. I cannot see any particular
problem with doing this (and I can think of a couple of fun things to
use it for) but it's probably worth being aware.
On 13 January 2015 at 07:26, Dario Taraborelli
I’m sharing a proposal that Reid Priedhorsky and
his collaborators at
Los Alamos National Laboratory recently submitted to the
Analytics Team aimed at producing privacy-preserving geo-aggregates of
Wikipedia pageview data dumps and making them available to the public and
the research community. 
Reid and his team spearheaded the use of the public Wikipedia pageview
monitor and forecast the spread of influenza and other diseases,
using language as a proxy for location. This proposal describes an
aggregation strategy adding a geographical dimension to the existing dumps.
Feedback on the proposal is welcome on the lists or the project talk
page on Meta
Analytics mailing list
- Andrew Gray
Wiki-research-l mailing list
Analytics mailing list