Confirmations needed: https://www.mediawiki.org/?diff=1006822&oldid=964133
I am not sure if this is the info you are looking for but just in case, what was said on this thread back in March still stands: http://lists.wikimedia.org/pipermail/analytics/2014-March/001681.html
Some newer information on the sanitization front:
We are seriously thinking about implementing an incognito mode. Incognito mode will be "on" by default if you browse with cookies off. That is, if your browser is set to not make use of cookies, no data will be sampled by EL. This idea seems that is gaining ground and probably will turn into a project soon.
Regarding anonymization: after much discussion we believe that to properly anonymize EL data there is no other solution than aggregation.Recent events on this front include us pumping EL data to kafka from which we can pump it into hadoop. There data will go through ETL process to be sanitized. It is our current plan to discharge original raw logs once data is sanitized.
Note that IPs are sanitized in EL and they always been so. That is not the case of user agents that are stored raw.
Let us know if you are looking for other info besides the one provided here.
Thanks,
Nuria
On Fri, May 16, 2014 at 6:34 PM, Ori Livneh ori@wikimedia.org wrote:
On Fri, May 16, 2014 at 9:17 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
- From 40 to 260 events logged per second in a month: what's going on?
Eep, thanks for raising the alarm. MediaViewer is 170 events / sec, MultimediaViewerDuration is 38 / sec.
+CC Multimedia.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics