I think before we settle on a specific data store, we should determine what are the top queries people are interested in running, whether they expect to have scripted access to this data or primarily design a tool for human access and whether applying a threshold and cutting the long tail of low-traffic articles is a good approach for most consumers of this data.
The GLAM case described by Magnus is pretty well-defined, but I'd like to point out that:
• a large number of Wikipedias point to
stats.grok.se from the history page of every single article
• most researchers I've been talking to are interested in daily or hourly pv data per article
Should we list the requirements for different use cases on a wiki page where a larger number of people than the participants in this thread can voice their needs?
Dario
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics