[Foundation-l] Fw: Strike against the collection of personal data through edit links

Dario Taraborelli dtaraborelli at wikimedia.org
Fri Feb 10 21:27:58 UTC 2012


I put together a short explanation of how clicktracking works, what data it stores and why we use it. I'll work with Oliver to make sure this is also captured in the AFT5 FAQ. Feel free to contact me off-list if you have specific questions that I haven't answered here.

Dario


* What is clicktracking?

Clicktracking is an extension developed by the Wikimedia Foundation during the Usability initiative [1]. It has been used since then to test a number of features or to run some small-scale usability experiments.

* How does it work?

The extension collects click-through data (e.g. it counts "clicks on a call to action after posting article feedback") that is typically not stored in the database. An example of the data collected by this extension can be found here [2]
 
* Why do we use clicktracking in AFT?

We use clicktracking to measure aggregate click-through/completion rates as part of our analysis of AFT [3]. We randomly assign users to different "buckets" or experimental conditions (e.g. a specific AFT design) or to a control group. This allows us to measure how each condition performs with respect to each other. For example, we want to know how a specific AFT design affects editing behavior or how many people who see the AFT widget at a specific placement take a call-to-action. The two main  reasons why we use the clicktracking extension for this purpose are (1)  to capture bucket information, which is not stored in the database, and  (2) to measure drop-off rates for specific funnels (e.g how many users browse away after clicking on a button).

As such, the extension is used to count events for groups of users and it's not designed to track individuals, let alone store personally identifiable information. For example, it does NOT store user IDs or usernames for registered editors and it assigns and stores randomly generated tokens for every user.

* Why these ugly URLs when I click on a section link?

Clicktracking is usually implemented via javascript and session cookies, but in some cases it's easier to just pass a URL parameter when a form is submitted. We appreciate that the AFT5 implementation of clicktracking is not very elegant and we will disable it as soon as we've collected the data needed for the analysis.

* What is the status of data collected via clicktracking? 

Data collected via the clicktracking extension is subject to the privacy policy [4] and as such it's not publicly released, unless in a fully anonymized or aggregate form.

[1] http://www.mediawiki.org/wiki/Extension:ClickTracking
[2] http://meta.wikimedia.org/wiki/Research:Article_feedback/Clicktracking#Log_format_specification
[3] http://meta.wikimedia.org/wiki/Research:Article_feedback/Data_and_metrics
[4] http://wikimediafoundation.org/wiki/Privacy_policy

On Feb 5, 2012, at 9:32 PM, Howie Fung wrote:

> We would be able to look at just the edit summaries, but that would only
> provide us with analysis on edits that were successfully completed.  By
> including the actual clicks in the tracking, we can do analysis on the
> edit/save ratio (% of total edit attempts that were successfully saved).
> 
> Howie
> 
> On Sat, Feb 4, 2012 at 6:09 PM, Brandon Harris <bharris at wikimedia.org>wrote:
> 
>> 
>>       I'm not sure why this couldn't be done if that were all that is
>> being measured.  I suspect there's other behaviors being tracked.
>> 
>>       As I said, I'm not the person who knows most about this, so you
>> have to take what I am saying with a grain of salt.
>> 
>> 
>> 
>> On 2/4/12 5:21 PM, WereSpielChequers wrote:
>> 
>>> Hi Brandon, thanks for the explanation, but wouldn't it be easier to just
>>> analyse edit summaries? If you edit by section the edit summary defaults
>>> to
>>> start with the section heading.......
>>> 
>>> Were SpielChequers
>>> 
>>> Message: 7
>>> 
>>>> Date: Sat, 04 Feb 2012 14:51:49 -0800
>>>> From: Brandon Harris<bharris at wikimedia.org>
>>>> To: foundation-l at lists.wikimedia.**org<foundation-l at lists.wikimedia.org>
>>>> Subject: Re: [Foundation-l] Fw: Strike against the collection of
>>>>       personal data through edit links
>>>> Message-ID:<4F2DB685.70000@**wikimedia.org<4F2DB685.70000 at wikimedia.org>
>>>>> 
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>> 
>>>> 
>>>>       (This may not be 100% accurate; the person who knows most about
>>>> this is
>>>> on vacation, but I'll try to explain to the best of my understanding.)
>>>> 
>>>>       Those weird URLs are part of a clicktracking process.  It's a test
>>>> to
>>>> see how people go about editing the page *most often* (by section, or by
>>>> edit tab) and further to see how effective various calls-to-action (such
>>>> as those given by Article Feedback) are.
>>>> 
>>>>       The longevity of the data isn't something I can comment to but I'd
>>>> be
>>>> surprised if it lasted even 3 months.  I do not know if there are
>>>> identity markers connected to them but I wouldn't be surprised.
>>>> 
>>>>       To that end, the data is only useful in roll-ups, and wouldn't be
>>>> something published anywhere except in aggregate.
>>>> 
>>>> 
>>>> 
>>>> On 2/4/12 2:27 PM, Philippe Beaudette wrote:
>>>> 
>>>>> MZ is correct:  3 months is the purge for Checkuser data.
>>>>> 
>>>>> As to the rest of it, Diederick van Liere, our resident guru of data,
>>>>> 
>>>> will
>>>> 
>>>>> be checking into this, and will confirm back when we know exactly wht is
>>>>> intended by the devs for that data.  I will say that generally speaking,
>>>>> the Foundation prefers to maintain the minimum data possible for the
>>>>> shortest period of time.
>>>>> 
>>>>> Thanks,
>>>>> pb
>>>>> ___________________
>>>>> Philippe Beaudette
>>>>> Head of Reader Relations
>>>>> Wikimedia Foundation, Inc.
>>>>> 
>>>>> 415-839-6885, x 6643
>>>>> 
>>>>> philippe at wikimedia.org
>>>>> 
>>>>> To check my email volume (and thus know approx how long it will take me
>>>>> 
>>>> to
>>>> 
>>>>> respond), go to http://courteous.ly/hpQmqy
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Feb 4, 2012 at 2:19 PM, MZMcBride<z at mzmcbride.com>   wrote:
>>>>> 
>>>>> Fred Bauder wrote:
>>>>>> 
>>>>>>> David Gerard wrote:
>>>>>>> 
>>>>>>>> 3 months I can live with :-) Can someone from WMF just confirm what
>>>>>>>> 
>>>>>>> data
>>>> 
>>>>> is kept for how long?
>>>>>>>> 
>>>>>>> 
>>>>>>> The exact time is confidential.
>>>>>>> 
>>>>>> 
>>>>>> Err, no, I don't think so. It's not defined in the files at
>>>>>> <http://noc.wikimedia.org/**conf/ <http://noc.wikimedia.org/conf/>>,
>>>>>> which means it should be using the
>>>>>> default, as defined at
>>>>>> <
>>>>>> 
>>>>>> http://svn.wikimedia.org/**viewvc/mediawiki/trunk/**
>>>> extensions/CheckUser/CheckU<http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckU>
>>>> 
>>>>> ser.php?revision=106556&view=**markup>. From that file:
>>>>>> 
>>>>>> ---
>>>>>> # How long to keep CU data?
>>>>>> $wgCUDMaxAge = 3 * 30 * 24 * 3600; // 3 months
>>>>>> ---
>>>>>> 
>>>>>> The last attempt to change this value (without community discussion)
>>>>>> was
>>>>>> summarily shot down:
>>>>>> <http://svn.wikimedia.org/**viewvc/mediawiki?view=**
>>>>>> revision&revision=40847<http://svn.wikimedia.org/viewvc/mediawiki?view=revision&revision=40847>
>>>>>> 
>>>>> .
>>>>> 
>>>>>> 
>>>>>> That's only CheckUser data, though. I'm not sure what David wants
>>>>>> 
>>>>> confirmed
>>>> 
>>>>> from the Wikimedia Foundation. Different data has different expiries. A
>>>>>> 
>>>>> lot
>>>> 
>>>>> of it is permanent (e.g., revisions aren't going anywhere for the most
>>>>>> part). I guess the question is specific to the ClickTracking extension:
>>>>>> <https://www.mediawiki.org/**wiki/Extension:ClickTracking<https://www.mediawiki.org/wiki/Extension:ClickTracking>
>>>>>>> ?
>>>>>> 
>>>>>> MZMcBride
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ______________________________**_________________
>>>>>> foundation-l mailing list
>>>>>> foundation-l at lists.wikimedia.**org <foundation-l at lists.wikimedia.org>
>>>>>> Unsubscribe: https://lists.wikimedia.org/**
>>>>>> mailman/listinfo/foundation-l<https://lists.wikimedia.org/mailman/listinfo/foundation-l>
>>>>>> 
>>>>>> ______________________________**_________________
>>>>> foundation-l mailing list
>>>>> foundation-l at lists.wikimedia.**org <foundation-l at lists.wikimedia.org>
>>>>> Unsubscribe: https://lists.wikimedia.org/**
>>>>> mailman/listinfo/foundation-l<https://lists.wikimedia.org/mailman/listinfo/foundation-l>
>>>>> 
>>>> 
>>>> --
>>>> Brandon Harris, Senior Designer, Wikimedia Foundation
>>>> 
>>>> Support Free Knowledge: http://wikimediafoundation.**org/wiki/Donate<http://wikimediafoundation.org/wiki/Donate>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ______________________________**_________________
>>> foundation-l mailing list
>>> foundation-l at lists.wikimedia.**org <foundation-l at lists.wikimedia.org>
>>> Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/foundation-l<https://lists.wikimedia.org/mailman/listinfo/foundation-l>
>>> 
>> 
>> --
>> Brandon Harris, Senior Designer, Wikimedia Foundation
>> 
>> Support Free Knowledge: http://wikimediafoundation.**org/wiki/Donate<http://wikimediafoundation.org/wiki/Donate>
>> 
>> ______________________________**_________________
>> foundation-l mailing list
>> foundation-l at lists.wikimedia.**org <foundation-l at lists.wikimedia.org>
>> Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/foundation-l<https://lists.wikimedia.org/mailman/listinfo/foundation-l>
>> 
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l




More information about the wikimedia-l mailing list