Hm, I
don't think we will have much trouble with the size of the input.
****
** **
Well my post was also about how to store hourly data in a concise manner
(sparse array really), so we could serve hourly precision without too much
overhead. ****
**
Well, I think your files do that pretty well, no need to duplicate that
work. The main desire here seems to be for a queryable database with as
much data as possible. I think the idea is to have a reliable datasource
on top of which something like stats.grok.se can be built. Sure we can
build this on top of flat files, but it sounds like people would rather
deal with a database.
That said, I think the database would be isomorphic to your sparse array
format, because it wouldn't store a cross product of pages to hours. It
would just have rows for where data exists. It would repeat the "page_id"
column, sure, but maybe hierarchical databases could help with that.
Dan