Anyway, I don't say that the project is impossible
or unnecessary, but
there're lots of tradeoffs to be made
- what kind of real time querying workloads are to be
expected, what kind of
pre-filtering do people expect, etc.
I could be biased here, but I think the canonical use case for someone seeking
page view information would be viewing page view counts for a set of articles --
most times a single article, but also multiple articles -- over an arbitrary
time range. Narrowing that down, I'm not sure whether the level of demand for
real-time data (say, for the previous hour) would be higher than the demand for
fast query results for more historical data. Would these two workloads imply
the kind of trade-off you were referring to? If not, could you give some
examples of what kind of expected workloads/use cases would entail such
trade-offs?
If ordering pages by page view count for a given time period would imply such a
tradeoff, then I think it'd make sense to deprioritize page ordering.
I'd be really interested to know your thoughts on an efficient schema for
organizing the raw page view data in the archives at
http://dammit.lt/wikistats/.
Thanks,
Eric