SeanBRCan we please purge stuff? :-)Hi!I'd like to hear from stakeholders about purging old data from the eventlogging database. Yes, no, why [not], etc.
I understand from Ori that there is a 90 day retention policy, and that purging has been discussed previously but not addressed for various reasons. Certainly there are many timestamps older than 90 days still in the db, and apparently largely untouched by queries?
Perhaps we're in a better position now to do this properly what with data now in multiple places: log files, database, hadoop...
All, I wanted to hear your thoughts informally (before posting to the lists) on two ideas that have been floating around recently:1) add support for optional sampling in EventLogging via JSON schemas (given the sheer number of teams who have asked for it). See https://bugzilla.wikimedia.org/show_bug.cgi?id=655002) introduce 90-day pruning by default for all logs, (adding a dedicated schema element to override the default).This would push to the customers the responsibility of ensuring the right data is collected and retained.I understand 2) has already been partly implemented for the raw JSON logs (not yet for EL data stored in SQL). Obviously, we would need to audit existing logs to make sure that we don’t discard data that needs to be retained in a sanitized or aggregate form past 90 days.