Fair enough - I don't use it, and I think I'd got entirely the wrong
end of the stick on what it's for! If it's intended to stop tracking
by third-party sites then it certainly seems to be of little relevance
(It might be worth clarifying this in the proposal, in case a future
ethics-committee reviewer gets the same misapprehension?)
On 13 January 2015 at 20:24, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
I think it is reasonable to assume that the "Do not track" header isn't
referring to this.
with emphasis added.
Do Not Track is a technology and policy proposal that enables users to opt
out of tracking by websites they do not visit, [...]
Do not track is explicitly for third party tracking. We are merely
proposing to count those people who do access our sites. Note that, in this
case, we are not interested in obtaining identifiers at all, so the word
"track" seems to not apply.
It seems like we're looking for something like a "Do Not Log Anything At
All" header. I don't believe that such a thing exists -- but if it did I
think it would be good if we supported it.
On Tue, Jan 13, 2015 at 2:03 PM, Andrew Gray <andrew.gray(a)dunelm.org.uk>
Hi Dario, Reid,
This seems sensible enough and proposal #3 is clearly the better
approach. An explicit opt-in opt-out mechanism would not be worth the
effort to build and would become yet another ignored preferences
setting after a few weeks...
A couple of thoughts:
* I understand the reasoning for not using do-not-track headers (#4);
however, it feels a bit odd to say "they probably don't mean us" and
skip them... I can almost guarantee you'll have at least one person
making a vocal fuss about not being able to opt-out without an
account. If we were to honour these headers, would it make a
significant change to the amount of data available? Would it likely
skew it any more than leaving off logged-in users?
* Option 3 does releases one further piece of information over and
above those listed - an approximate ratio of logged in versus
non-logged-in pageviews for a page. I cannot see any particular
problem with doing this (and I can think of a couple of fun things to
use it for) but it's probably worth being aware.
On 13 January 2015 at 07:26, Dario Taraborelli
I’m sharing a proposal that Reid Priedhorsky and
his collaborators at
Los Alamos National Laboratory recently submitted to the Wikimedia Analytics
Team aimed at producing privacy-preserving geo-aggregates of Wikipedia
pageview data dumps and making them available to the public and the research
Reid and his team spearheaded the use of the public Wikipedia pageview
dumps to monitor and forecast the spread of influenza and other diseases,
using language as a proxy for location. This proposal describes an
aggregation strategy adding a geographical dimension to the existing dumps.
Feedback on the proposal is welcome on the lists or the project talk
page on Meta 
Analytics mailing list
- Andrew Gray
Wiki-research-l mailing list
Analytics mailing list