Hi Pine,
On Thu, Mar 27, 2014 at 11:45:59PM -0700, ENWP Pine wrote:
No UA data is recorded from any platform for non-edit actions like pageviews and watchlisting, even if an editor is logged in, right?
no, that is not correct.
Currently, for each request [1] to the text and mobile caches udp2log holds [2] the User-Agent (column 14), and URL (column 9). This data gets stored away into files (some parts sampled, some parts unsampled). I and some others have access to this data.
Same for kafka and mobile caches. There it is always unsampled.
You could do all kinds of bad™ things with those data sets, if you wanted.
I do see that Ops might need that data. They should have it.
But I hope that my access to this /raw/ data, and access of other fellow Analytics team members to this raw data get's killed in the foreseeable future, and that we effectively make it impossible to fingerprint/track people around.
Best regards, Christian
[1] Regardless of whether it is edit or non-edit. Regardless of the action. Regardless of the platform.
[2] https://wikitech.wikimedia.org/wiki/Cache_log_format