Pine:The pageview_hourly dataset on hive contains pageviews, not edits.The majority of data for edits is not associated to a user-agent as it is stored on mediawiki database. Some of it comes via Eventlogging as experiments are run in, for example, visual editor. This second venue of data is of a very different nature than the one we just run this test on, it is heavily sampled, not public, and will be purged every 90 days.Thanks,NuriaOn Mon, Sep 28, 2015 at 7:23 AM, Pine W <wiki.pine@gmail.com> wrote:Hi Nuria,
Thanks for wirking on this.
Removing user_agent_map would be only for readership data, correct? Would this data still be stored for edits, and if so, for how long?
Pine
On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <nuria@wikimedia.org> wrote:_______________________________________________Hello,We have been working on the exercise of reconstructing an identity using the (still private) pageview_hourly dataset (https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly)TL;DRIt is possible (and easy) to do that with the fields the dataset has now, before releasing it publicly we need to further anonymize it.More info here:Thanks,Nuria
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics