Re: [Analytics] Back of the envelope data size for "Queryable public interface for pageview data" [was: Re: Queryable public interface for pageview data]

3 Oct 2013

      ...
...
Hm, I don't think we will have much trouble with the size of the input.

Well my post was also about how to store hourly data in a concise manner
(sparse array really), so we could serve hourly precision without too much
overhead.  ****
**
Well, I think your files do that pretty well, no need to duplicate that
work.  The main desire here seems to be for a queryable database with as
much data as possible.  I think the idea is to have a reliable datasource
on top of which something like stats.grok.se can be built.  Sure we can
build this on top of flat files, but it sounds like people would rather
deal with a database.
That said, I think the database would be isomorphic to your sparse array
format, because it wouldn't store a cross product of pages to hours.  It
would just have rows for where data exists.  It would repeat the "page_id"
column, sure, but maybe hierarchical databases could help with that.
Dan

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Back of the envelope data size for "Queryable public interface for pageview data" [was: Re: Queryable public interface for pageview data]