Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Maybe it make sense to ask Henrik (stats.grok.se) for his download stats. We at wikipediatrends.com usually receive < 5 request/month for full data (from 2008).
On Fri, Jul 29, 2016 at 2:40 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
That's very useful, Alex, thanks! I guess those requests would need to be covered by dumps anyway, since we only have data back to 2015. I'll ping Henrik too.
On Fri, Jul 29, 2016 at 12:06 PM, Alex Druk alex.druk@gmail.com wrote:
Maybe it make sense to ask Henrik (stats.grok.se) for his download stats. We at wikipediatrends.com usually receive < 5 request/month for full data (from 2008).
On Fri, Jul 29, 2016 at 2:40 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Glad to be helpful!😁 It is interesting that, contrary to my predictions, number of visitors to wikipediatrends.com and downloads did not drop when API became available. I am wondering if stats.grok.se notice significant drop. Alex
On Fri, Jul 29, 2016 at 6:08 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
That's very useful, Alex, thanks! I guess those requests would need to be covered by dumps anyway, since we only have data back to 2015. I'll ping Henrik too.
On Fri, Jul 29, 2016 at 12:06 PM, Alex Druk alex.druk@gmail.com wrote:
Maybe it make sense to ask Henrik (stats.grok.se) for his download stats. We at wikipediatrends.com usually receive < 5 request/month for full data (from 2008).
On Fri, Jul 29, 2016 at 2:40 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Thank you.
Alex Druk alex.druk@gmail.com (775) 237-8550 Google voice
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
My personal use cases which are primarily using the visualization tools would appreciate more dimensionality in daily and weekly views (which increases storage). I think you should definitely degrade the resolution, possibly more aggressively than you propose. RRDTool has been doing this for decades so people are used to it.
-Toby
On Fri, Jul 29, 2016 at 5:40 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Would that mean that API calls would break over time? Eg. if I had a call for daily resolution over a specific time period 17 months back, it would break in a month?
/Jan Ainali (skickat på språng så ursäkta min fåordighet)
On Jul 29, 2016 15:41, "Dan Andreescu" dandreescu@wikimedia.org wrote:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
My use case: historical data beyond 18 months would be really useful for teaching data science.
This spring, I had a bunch of college programming students using the PageView API in combination with the standard MW API in their course research projects. They tracked edits and views to particular pages over time (example: Wikipedia articles about television shows like *Game of Thrones* and *Silicon Valley*). Goal was to understand whether the release of a new episode/season triggered an increase in edits to the Wikipedia article, or just views.
In terms of granularity: article pageviews spike and fall rapidly in response to external events. Reducing the granularity to weekly or monthly would make the data less useful, because it averages out a lot of these interesting dynamics.
Parsing the dumps is not a huge deal, but it involves several additional steps and requires somewhat more expertise.
- Jonathan
On Fri, Jul 29, 2016 at 5:40 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I am now checking traffic data every day to see whether Compact Language Links affect it. It makes sense to compare them not only to the previous week, but also to the same month previous year. So one year is not hardly enough. 18 months is better, and three years is much better because I'll be able to check also the same month in earlier years.
I imagine that this may be useful to all product managers that work on features that can affect traffic.
בתאריך 29 ביולי 2016 15:41, "Dan Andreescu" dandreescu@wikimedia.org כתב:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Dan, Thanks for reaching out.
18 months is enough for my use cases as long as the dumps capture the exact data structure.
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Fri, Jul 29, 2016 at 11:51 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
I am now checking traffic data every day to see whether Compact Language Links affect it. It makes sense to compare them not only to the previous week, but also to the same month previous year. So one year is not hardly enough. 18 months is better, and three years is much better because I'll be able to check also the same month in earlier years.
I imagine that this may be useful to all product managers that work on features that can affect traffic.
בתאריך 29 ביולי 2016 15:41, "Dan Andreescu" dandreescu@wikimedia.org כתב:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Amir and Jonathan - thanks for speaking up for the "more than 18 months" use cases. If dumps were *much* easier to use (via python clients that made it transparent whether you were hitting the API or not), would that be an acceptable solution? I feel like both of your use cases are not things that will be happening on a daily basis. If that's true, another solution would be an ad-hoc API that took in a filter and a date range, applied it server-side, and gave you a partial dump with only the interesting data. If this didn't happen very often, it would allow us to trade processing time and a bit of dev time for more expensive storage.
Or, if we end up needing frequent access to old data, we should be able to justify spending more money on more servers. Just trying to save as much money as possible :)
Thanks all so far, please feel free to keep chiming in if you have other use cases that haven't been covered, or if you'd like to add more weight behind the "more than 18 months" use cases.
On Fri, Jul 29, 2016 at 3:18 PM, Leila Zia leila@wikimedia.org wrote:
Dan, Thanks for reaching out.
18 months is enough for my use cases as long as the dumps capture the exact data structure.
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Fri, Jul 29, 2016 at 11:51 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
I am now checking traffic data every day to see whether Compact Language Links affect it. It makes sense to compare them not only to the previous week, but also to the same month previous year. So one year is not hardly enough. 18 months is better, and three years is much better because I'll be able to check also the same month in earlier years.
I imagine that this may be useful to all product managers that work on features that can affect traffic.
בתאריך 29 ביולי 2016 15:41, "Dan Andreescu" dandreescu@wikimedia.org כתב:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Dan,
Making dumps much easier to use would definitely help. We Wikipedia researchers are kind of spoiled: we have easy public access to historical revision data for all projects, going back to 2001, through the API *and* public db endpoints like Quarry. It's only natural that we want the same thing with pageviews!!! :)
I can think of other use-cases for keeping more than 18 months of data available through the API, but they're all research use cases. I don't think having lower-granularity historical data available beyond a certain point is helpful for those--if you're doing historical analysis, you want consistency. But a application that parsed dumps on the server-side to yield historical data (ideally in a format and granularity that wasn't fundamentally different from that of the API, so you could join the streams) would definitely be useful, and would probably address most research needs I can think of, inside and outside the Foundation.
Thanks for asking, Jonathan
On Fri, Jul 29, 2016 at 12:27 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Amir and Jonathan - thanks for speaking up for the "more than 18 months" use cases. If dumps were *much* easier to use (via python clients that made it transparent whether you were hitting the API or not), would that be an acceptable solution? I feel like both of your use cases are not things that will be happening on a daily basis. If that's true, another solution would be an ad-hoc API that took in a filter and a date range, applied it server-side, and gave you a partial dump with only the interesting data. If this didn't happen very often, it would allow us to trade processing time and a bit of dev time for more expensive storage.
Or, if we end up needing frequent access to old data, we should be able to justify spending more money on more servers. Just trying to save as much money as possible :)
Thanks all so far, please feel free to keep chiming in if you have other use cases that haven't been covered, or if you'd like to add more weight behind the "more than 18 months" use cases.
On Fri, Jul 29, 2016 at 3:18 PM, Leila Zia leila@wikimedia.org wrote:
Dan, Thanks for reaching out.
18 months is enough for my use cases as long as the dumps capture the exact data structure.
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Fri, Jul 29, 2016 at 11:51 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
I am now checking traffic data every day to see whether Compact Language Links affect it. It makes sense to compare them not only to the previous week, but also to the same month previous year. So one year is not hardly enough. 18 months is better, and three years is much better because I'll be able to check also the same month in earlier years.
I imagine that this may be useful to all product managers that work on features that can affect traffic.
בתאריך 29 ביולי 2016 15:41, "Dan Andreescu" dandreescu@wikimedia.org כתב:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Just curious -- how much would it cost to make all of the data available at a daily granularity for a year?
On Fri, Jul 29, 2016 at 4:30 PM, Jonathan Morgan jmorgan@wikimedia.org wrote:
Hi Dan,
Making dumps much easier to use would definitely help. We Wikipedia researchers are kind of spoiled: we have easy public access to historical revision data for all projects, going back to 2001, through the API *and* public db endpoints like Quarry. It's only natural that we want the same thing with pageviews!!! :)
I can think of other use-cases for keeping more than 18 months of data available through the API, but they're all research use cases. I don't think having lower-granularity historical data available beyond a certain point is helpful for those--if you're doing historical analysis, you want consistency. But a application that parsed dumps on the server-side to yield historical data (ideally in a format and granularity that wasn't fundamentally different from that of the API, so you could join the streams) would definitely be useful, and would probably address most research needs I can think of, inside and outside the Foundation.
Thanks for asking, Jonathan
On Fri, Jul 29, 2016 at 12:27 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Amir and Jonathan - thanks for speaking up for the "more than 18 months" use cases. If dumps were *much* easier to use (via python clients that made it transparent whether you were hitting the API or not), would that be an acceptable solution? I feel like both of your use cases are not things that will be happening on a daily basis. If that's true, another solution would be an ad-hoc API that took in a filter and a date range, applied it server-side, and gave you a partial dump with only the interesting data. If this didn't happen very often, it would allow us to trade processing time and a bit of dev time for more expensive storage.
Or, if we end up needing frequent access to old data, we should be able to justify spending more money on more servers. Just trying to save as much money as possible :)
Thanks all so far, please feel free to keep chiming in if you have other use cases that haven't been covered, or if you'd like to add more weight behind the "more than 18 months" use cases.
On Fri, Jul 29, 2016 at 3:18 PM, Leila Zia leila@wikimedia.org wrote:
Dan, Thanks for reaching out.
18 months is enough for my use cases as long as the dumps capture the exact data structure.
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Fri, Jul 29, 2016 at 11:51 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
I am now checking traffic data every day to see whether Compact Language Links affect it. It makes sense to compare them not only to the previous week, but also to the same month previous year. So one year is not hardly enough. 18 months is better, and three years is much better because I'll be able to check also the same month in earlier years.
I imagine that this may be useful to all product managers that work on features that can affect traffic.
בתאריך 29 ביולי 2016 15:41, "Dan Andreescu" dandreescu@wikimedia.org כתב:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I'll let Andrew or Luca answer the cost question.
Johnathan, totally agreed on the need for consistency in historical analysis. As far as having edit data available back to 2001, we're working on rebuilding editing history and you'll see there's a lot of stuff that's not available via the APIs or quarry. We'll talk more about that soon.
On Fri, Jul 29, 2016 at 7:32 PM, Toby Negrin tnegrin@wikimedia.org wrote:
Just curious -- how much would it cost to make all of the data available at a daily granularity for a year?
On Fri, Jul 29, 2016 at 4:30 PM, Jonathan Morgan jmorgan@wikimedia.org wrote:
Hi Dan,
Making dumps much easier to use would definitely help. We Wikipedia researchers are kind of spoiled: we have easy public access to historical revision data for all projects, going back to 2001, through the API *and* public db endpoints like Quarry. It's only natural that we want the same thing with pageviews!!! :)
I can think of other use-cases for keeping more than 18 months of data available through the API, but they're all research use cases. I don't think having lower-granularity historical data available beyond a certain point is helpful for those--if you're doing historical analysis, you want consistency. But a application that parsed dumps on the server-side to yield historical data (ideally in a format and granularity that wasn't fundamentally different from that of the API, so you could join the streams) would definitely be useful, and would probably address most research needs I can think of, inside and outside the Foundation.
Thanks for asking, Jonathan
On Fri, Jul 29, 2016 at 12:27 PM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
Amir and Jonathan - thanks for speaking up for the "more than 18 months" use cases. If dumps were *much* easier to use (via python clients that made it transparent whether you were hitting the API or not), would that be an acceptable solution? I feel like both of your use cases are not things that will be happening on a daily basis. If that's true, another solution would be an ad-hoc API that took in a filter and a date range, applied it server-side, and gave you a partial dump with only the interesting data. If this didn't happen very often, it would allow us to trade processing time and a bit of dev time for more expensive storage.
Or, if we end up needing frequent access to old data, we should be able to justify spending more money on more servers. Just trying to save as much money as possible :)
Thanks all so far, please feel free to keep chiming in if you have other use cases that haven't been covered, or if you'd like to add more weight behind the "more than 18 months" use cases.
On Fri, Jul 29, 2016 at 3:18 PM, Leila Zia leila@wikimedia.org wrote:
Dan, Thanks for reaching out.
18 months is enough for my use cases as long as the dumps capture the exact data structure.
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Fri, Jul 29, 2016 at 11:51 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
I am now checking traffic data every day to see whether Compact Language Links affect it. It makes sense to compare them not only to the previous week, but also to the same month previous year. So one year is not hardly enough. 18 months is better, and three years is much better because I'll be able to check also the same month in earlier years.
I imagine that this may be useful to all product managers that work on features that can affect traffic.
בתאריך 29 ביולי 2016 15:41, "Dan Andreescu" dandreescu@wikimedia.org כתב:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster. Right now, with a reliable RAID setup, we can keep *18 months* of data. If you'd like to query further back than that, you can download dump files (which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we need to add more nodes when we get to that point, and that costs money, so we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only keep weekly or monthly for data older than 1 year for example). If you need more than 18 months, we'd love to hear your use case and something in the form of:
need daily resolution for 1 year need weekly resolution for 2 years need monthly resolution for 3 years
Thank you!
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics