Hi Pine,
On Thu, Mar 27, 2014 at 11:45:59PM -0700, ENWP Pine wrote:
No UA data is recorded from any platform for non-edit
actions like
pageviews and watchlisting, even if an editor is logged in, right?
no, that is not correct.
Currently, for each request [1] to the text and mobile caches udp2log
holds [2] the User-Agent (column 14), and URL (column 9). This data
gets stored away into files (some parts sampled, some parts
unsampled). I and some others have access to this data.
Same for kafka and mobile caches. There it is always unsampled.
You could do all kinds of bad™ things with those data sets, if you
wanted.
I do see that Ops might need that data. They should have it.
But I hope that my access to this /raw/ data, and access of other
fellow Analytics team members to this raw data get's killed in the
foreseeable future, and that we effectively make it impossible to
fingerprint/track people around.
Best regards,
Christian
[1] Regardless of whether it is edit or non-edit.
Regardless of the action.
Regardless of the platform.
[2]
https://wikitech.wikimedia.org/wiki/Cache_log_format
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------