Re: [Wiki-research-l] [Analytics] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal

13 Jan 2015

Fair enough - I don't use it, and I think I'd got entirely the wrong
end of the stick on what it's for! If it's intended to stop tracking
by third-party sites then it certainly seems to be of little relevance
here.

(It might be worth clarifying this in the proposal, in case a future
ethics-committee reviewer gets the same misapprehension?)

Andrew.

On 13 January 2015 at 20:24, Aaron Halfaker &lt;ahalfaker(a)wikimedia.org&gt; wrote:
...
  Andrew,

 I think it is reasonable to assume that the "Do not track" header isn't
 referring to this.

 From http://donottrack.us/ with emphasis added.

 Do Not Track is a technology and policy proposal that enables users to opt
 out of tracking by websites they do not visit, [...] 

 Do not track is explicitly for third party tracking.  We are merely
 proposing to count those people who do access our sites.  Note that, in this
 case, we are not interested in obtaining identifiers at all, so the word
 "track" seems to not apply.

 It seems like we're looking for something like a "Do Not Log Anything At
 All" header.  I don't believe that such a thing exists -- but if it did I
 think it would be good if we supported it.

 -Aaron

 On Tue, Jan 13, 2015 at 2:03 PM, Andrew Gray &lt;andrew.gray(a)dunelm.org.uk&gt;
 wrote:

 Hi Dario, Reid,

 This seems sensible enough and proposal #3 is clearly the better
 approach. An explicit opt-in opt-out mechanism would not be worth the
 effort to build and would become yet another ignored preferences
 setting after a few weeks...

 A couple of thoughts:

 * I understand the reasoning for not using do-not-track headers (#4);
 however, it feels a bit odd to say "they probably don't mean us" and
 skip them... I can almost guarantee you'll have at least one person
 making a vocal fuss about not being able to opt-out without an
 account. If we were to honour these headers, would it make a
 significant change to the amount of data available? Would it likely
 skew it any more than leaving off logged-in users?

 * Option 3 does releases one further piece of information over and
 above those listed - an approximate ratio of logged in versus
 non-logged-in pageviews for a page. I cannot see any particular
 problem with doing this (and I can think of a couple of fun things to
 use it for) but it's probably worth being aware.

 Andrew.

 On 13 January 2015 at 07:26, Dario Taraborelli
 &lt;dtaraborelli(a)wikimedia.org&gt; wrote:
  I’m sharing a proposal that Reid Priedhorsky and
his collaborators at
 Los Alamos National Laboratory recently submitted to the Wikimedia Analytics
 Team aimed at producing privacy-preserving geo-aggregates of Wikipedia
 pageview data dumps and making them available to the public and the research
 community. [1]

 Reid and his team spearheaded the use of the public Wikipedia pageview
 dumps to monitor and forecast the spread of influenza and other diseases,
 using language as a proxy for location. This proposal describes an
 aggregation strategy adding a geographical dimension to the existing dumps.

 Feedback on the proposal is welcome on the lists or the project talk
 page on Meta [3]

 Dario

 [1]
 https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pagev…
 [2] http://dx.doi.org/10.1371/journal.pcbi.1003892
 [3]
 https://meta.wikimedia.org/wiki/Research_talk:Geo-aggregation_of_Wikipedia_…
 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics 

 --
 - Andrew Gray
   andrew.gray(a)dunelm.org.uk

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l 

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

-- 
- Andrew Gray
  andrew.gray(a)dunelm.org.uk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] [Analytics] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal