Cross posting.
---------- Forwarded message ----------
From: *Dan Andreescu* <dandreescu(a)wikimedia.org>
Date: Friday, July 29, 2016
Subject: [Analytics] [Pageview API] Data Retention Question
To: Analytics List <analytics(a)lists.wikimedia.org>
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right
now, with a reliable RAID setup, we can keep *18 months* of data. If you'd
like to query further back than that, you can download dump files (which
we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we
need to add more nodes when we get to that point, and that costs money, so
we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only
keep weekly or monthly for data older than 1 year for example). If you
need more than 18 months, we'd love to hear your use case and something in
the form of:
need daily resolution for 1 year
need weekly resolution for 2 years
need monthly resolution for 3 years
Thank you!
Dan