Anyway, I don't say that the project is impossible or unnecessary, but
there're lots of tradeoffs to be made
- what kind of real time querying workloads are to be expected, what kind of
pre-filtering do people expect, etc.
I could be biased here, but I think the canonical use case for someone seeking page view information would be viewing page view counts for a set of articles -- most times a single article, but also multiple articles -- over an arbitrary time range. Narrowing that down, I'm not sure whether the level of demand for real-time data (say, for the previous hour) would be higher than the demand for fast query results for more historical data. Would these two workloads imply the kind of trade-off you were referring to? If not, could you give some examples of what kind of expected workloads/use cases would entail such trade-offs?
If ordering pages by page view count for a given time period would imply such a tradeoff, then I think it'd make sense to deprioritize page ordering.
I'd be really interested to know your thoughts on an efficient schema for organizing the raw page view data in the archives at http://dammit.lt/wikistats/.
Thanks, Eric