My use case: historical data beyond 18 months would be really useful for
teaching data science.
This spring, I had a bunch of college programming students using the
PageView API in combination with the standard MW API in their course
research projects. They tracked edits and views to particular pages over
time (example: Wikipedia articles about television shows like *Game of
Thrones* and *Silicon Valley*). Goal was to understand whether the release
of a new episode/season triggered an increase in edits to the Wikipedia
article, or just views.
In terms of granularity: article pageviews spike and fall rapidly in
response to external events. Reducing the granularity to weekly or monthly
would make the data less useful, because it averages out a lot of these
interesting dynamics.
Parsing the dumps is not a huge deal, but it involves several additional
steps and requires somewhat more expertise.
- Jonathan
On Fri, Jul 29, 2016 at 5:40 AM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Dear Pageview API consumers,
We would like to plan storage capacity for our pageview API cluster.
Right now, with a reliable RAID setup, we can keep *18 months* of data.
If you'd like to query further back than that, you can download dump files
(which we'll make easier to use with python utilities).
What do you think? Will you need more than 18 months of data? If so, we
need to add more nodes when we get to that point, and that costs money, so
we want to check if there is a real need for it.
Another option is to start degrading the resolution for older data (only
keep weekly or monthly for data older than 1 year for example). If you
need more than 18 months, we'd love to hear your use case and something in
the form of:
need daily resolution for 1 year
need weekly resolution for 2 years
need monthly resolution for 3 years
Thank you!
Dan
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>