My use case: historical data beyond 18 months would be really useful for teaching data science.
This spring, I had a bunch of college programming students using the PageView API in combination with the standard MW API in their course research projects. They tracked edits and views to particular pages over time (example: Wikipedia articles about television shows like Game of Thrones and Silicon Valley). Goal was to understand whether the release of a new episode/season triggered an increase in edits to the Wikipedia article, or just views.
In terms of granularity: article pageviews spike and fall rapidly in response to external events. Reducing the granularity to weekly or monthly would make the data less useful, because it averages out a lot of these interesting dynamics.
Parsing the dumps is not a huge deal, but it involves several additional steps and requires somewhat more expertise.
- Jonathan