Cross posting.
---------- Forwarded message ---------- From: *Dan Andreescu* dandreescu@wikimedia.org Date: Friday, July 29, 2016 Subject: [Analytics] [Pageview API] Data Retention Question To: Analytics List analytics@lists.wikimedia.org
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
For the iOS app I can say that 18 months is more than enough for our current feature set and upcoming plans.
Even if we began displaying graphs of page views over time… I can’t see any need to go back more than a few weeks or months.
For historical data the idea of degrading is an interesting one. I think that the daily data becomes much less important as you go back in time. Even if we only kept daily data for 6 months, that would be enough for our use cases.
This is probably true for Android as well, since we have pretty similar UI, but I’ll let them chime in to be sure.
Let me know if you want to know any further info.
On Fri, Jul 29, 2016 at 10:31 AM, Adam Baso abaso@wikimedia.org wrote:
Cross posting.
---------- Forwarded message ---------- From: *Dan Andreescu* dandreescu@wikimedia.org Date: Friday, July 29, 2016 Subject: [Analytics] [Pageview API] Data Retention Question To: Analytics List analytics@lists.wikimedia.org
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l