> Hm, I don't think we will have much trouble with the size of the input.

 

Well my post was also about how to store hourly data in a concise manner (sparse array really), so we could serve hourly precision without too much overhead.  

 


Well, I think your files do that pretty well, no need to duplicate that work.  The main desire here seems to be for a queryable database with as much data as possible.  I think the idea is to have a reliable datasource on top of which something like stats.grok.se can be built.  Sure we can build this on top of flat files, but it sounds like people would rather deal with a database.

That said, I think the database would be isomorphic to your sparse array format, because it wouldn't store a cross product of pages to hours.  It would just have rows for where data exists.  It would repeat the "page_id" column, sure, but maybe hierarchical databases could help with that.

Dan