Hi Nuria,
OK, so the useragent data for edits is stored in a different database, is
heavily sampled when used for research, and will still be accessible for CU
use if user_agent_map is removed from the pageview_hourly data, right?
On Mon, Sep 28, 2015 at 10:48 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
Pine:
The pageview_hourly dataset on hive contains pageviews, not edits.
The majority of data for edits is not associated to a user-agent as it is
stored on mediawiki database. Some of it comes via Eventlogging as
experiments are run in, for example, visual editor. This second venue of
data is of a very different nature than the one we just run this test on,
it is heavily sampled, not public, and will be purged every 90 days.
https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_retention_a…
Thanks,
Nuria
On Mon, Sep 28, 2015 at 7:23 AM, Pine W <wiki.pine(a)gmail.com> wrote:
Hi Nuria,
Thanks for wirking on this.
Removing user_agent_map would be only for readership data, correct? Would
this data still be stored for edits, and if so, for how long?
Pine
On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <nuria(a)wikimedia.org> wrote:
Hello,
We have been working on the exercise of reconstructing an identity using
the (still private) pageview_hourly dataset (
https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly)
TL;DR
It is possible (and easy) to do that with the fields the dataset has
now, before releasing it publicly we need to further anonymize it.
More info here:
https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityRecons…
Thanks,
Nuria
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics