Re: [Analytics] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal

13 Jan 2015

Hi Dario, Reid,

This seems sensible enough and proposal #3 is clearly the better
approach. An explicit opt-in opt-out mechanism would not be worth the
effort to build and would become yet another ignored preferences
setting after a few weeks...

A couple of thoughts:

* I understand the reasoning for not using do-not-track headers (#4);
however, it feels a bit odd to say "they probably don't mean us" and
skip them... I can almost guarantee you'll have at least one person
making a vocal fuss about not being able to opt-out without an
account. If we were to honour these headers, would it make a
significant change to the amount of data available? Would it likely
skew it any more than leaving off logged-in users?

* Option 3 does releases one further piece of information over and
above those listed - an approximate ratio of logged in versus
non-logged-in pageviews for a page. I cannot see any particular
problem with doing this (and I can think of a couple of fun things to
use it for) but it's probably worth being aware.

Andrew.

On 13 January 2015 at 07:26, Dario Taraborelli
&lt;dtaraborelli(a)wikimedia.org&gt; wrote:
...
  I’m sharing a proposal that Reid Priedhorsky and his
collaborators at Los Alamos National Laboratory recently submitted to the Wikimedia
Analytics Team aimed at producing privacy-preserving geo-aggregates of Wikipedia pageview
data dumps and making them available to the public and the research community. [1]

 Reid and his team spearheaded the use of the public Wikipedia pageview dumps to monitor
and forecast the spread of influenza and other diseases, using language as a proxy for
location. This proposal describes an aggregation strategy adding a geographical dimension
to the existing dumps.

 Feedback on the proposal is welcome on the lists or the project talk page on Meta [3]

 Dario

 [1] https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pagev…
 [2] http://dx.doi.org/10.1371/journal.pcbi.1003892
 [3] https://meta.wikimedia.org/wiki/Research_talk:Geo-aggregation_of_Wikipedia_…
 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics 

-- 
- Andrew Gray
  andrew.gray(a)dunelm.org.uk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal