On Tue, Jun 9, 2015 at 11:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Eric, I think we should allow arbitrary querying on any dimension for that first data block. We could pre-aggregate all of those combinations pretty easily since the dimensions have very low cardinality.
Are you thinking about something like /{project|all}/{agent|all}/{day}/{hour}, or will there be a lot more dimensions?
For the article-level data, no, we'd want just basic timeseries querying.
Thanks Gabriel, if you could point us to an example of these secondary RESTBase indices, that'd be interesting.
The API used to define these tables is described in https://github.com/wikimedia/restbase/blob/master/doc/TableStorageAPI.md, and the algorithm used to keep those indexes up to date is described in https://github.com/wikimedia/restbase-mod-table-cassandra/blob/master/doc/Se... and largely implemented in https://github.com/wikimedia/restbase-mod-table-cassandra/blob/master/lib/se... .