Thanks, Dan! I just expanded it even more (correcting my earlier error confusing the privacy policy and data retention guidelines). See what you think of this:
*Since this raw data identifies the location of individual editors, we keep it for only 90 days, in accordance with our data retention guidelines https://meta.wikimedia.org/wiki/Data_retention_guidelines. Data older than 90 days is continuously purged from the source cu_changes table, but since we regenerate the Data Lake's editing data every month https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster#Edit_data, we instead keep data in mediawiki_private_cu_changes and geoeditors_daily for the two latest calendar months (the month of the latest mediawiki_history https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history snapshot and the previous). Older data may be temporarily available before it is purged, but you should not rely on this. *
On Tue, 20 Nov 2018 at 12:43, Dan Andreescu dandreescu@wikimedia.org wrote:
That's right, Neil, I just changed the language around a bit, thanks for updating that!
On Tue, Nov 20, 2018 at 3:26 PM Neil Patel Quinn nquinn@wikimedia.org wrote:
Hey there!
Could someone from Analytics clarify the purging schedule for geoeditors_daily and add it on Wikitech https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors#Generation? I've added some information based on my experience using the dataset, but it may not be fully accurate.
I wrote: *Because these tables contain the countries of individual editors, we only keep the data corresponding to the two most recent full months (the month of the latest mediawiki_history snapshot and the previous).* _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics