This is not our intention for the long term, we are in the middle of putting in place a sanitization strategy to get rid of any PII after
90 days.
This discussion might make more sense in another thread though, kindly please do not hijack Sajjad's thread :)
The results of our internal discussion regarding sanitization go in this regard so far:
For users that have ethical concerns about their data being gathered via EventLogging we have thought we could provide an incognito mode. Incognito mode will be "on" by default if you browse with cookies off. That is, if your browser is set to not make use of cookies, no data will be sampled. This is so far just an idea.
Regarding anonymization: after much discussion we believe that to properly anonymize EventLogging data there is no other solution than aggregation and for that we need to build infrastructure that will "consume" EventLogging events. At this time EventLogging just samples discrete events thus data is stored as "discrete" data points. That being said, IPs are always anonymized in any EventLogging dataset. Not so User Agents.
We shall be updating this wiki in the near future with more information: https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
On Thu, Mar 13, 2014 at 12:43 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Thu, Mar 13, 2014 at 9:32 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Andrew Gray, 13/03/2014 00:56:
For that matter, surely this data won't exist anyway before 2013 or so?
I'm not sure how long we retain IP data for logged-in users, but I'd be a bit startled if it was five years.
EventLogging can contain almost anything I think. Is there any purging? I don't think so. Is it aggregate and anonymised? No longer. < https://www.mediawiki.org/w/index.php?title=Extension: EventLogging&diff=prev&oldid=905171>
On Thu, Mar 13, 2014 at 5:19 AM, Nuria Ruiz nuria@wikimedia.org wrote:
Sorry but this is not correct:
IP addresses are anonymized in Event Logging and they always have been so. We calculate a HMAC with a rotating salt that changes either every 90 days or with a service restart.
Event Logging data has never been aggregated, it is a system to log discrete events. There had not been any changes on this regard as of late.
What Nuria said is correct, however, we do store some data, such as User Agents currently. This is not our intention for the long term, we are in the middle of putting in place a sanitization strategy to get rid of any PII after 90 days. This discussion might make more sense in another thread though, kindly please do not hijack Sajjad's thread :)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics