Hi all,

My thoughts and opinion around entry-point definition.

While we have as a long-term plan to provide 'on-the-fly per-query computation', for now we pre-aggregate every dataset we want serve, and store it in cassandra to be exposed by restbase.

It means we can't easily provide variable start/end aggregation easily.

We could either

- send every dataset in between the start and end date for a given time granularity level (could be big !).

- use '/top/{project}/{access}/{year}/{month}/{day}' entrypoint for instance, with possibility to skip the 'day' parameter to have full month.

@Thomas:

- As Andrew said, the data we have is pre-aggregated at hour level so far.

- The data is tagged in UTC timezone and we planned that requests would be using that timezone dy default.

- As said in this message, we are thinking of ways to provide better access to data (on the fly computation, lower time granularity and others), and this involves both technical and privacy concern. It will be for future :)

Joseph

On Sun, Sep 13, 2015 at 5:39 PM, Andrew Gray <andrew.gray@dunelm.org.uk> wrote:

On 13 September 2015 at 16:26, Thomas Steiner <tomac@google.com> wrote:
> I mean that somehow I could express getting data in an exact given period of
> time, say, exactly the day September 11, 2015 in the time zone CET (that day
> started at 3pm relative to PDT or 11pm relative to UTC). Without time zone
> support, I would get data “outside” of my desired local time zone. Hope this
> makes sense and is clear.

A cautious note on time zones...

If you're holding everything in one hour bins, as we currently do with
the aggregated data, then it's easy enough to switch from UTC to CET
to EST and so forth.

But not all time zones differ by one hour increments. Most noticeably,
India is on UTC+5:30, and a handful of other places also differ by 30
minutes from the standard (or in the case of Nepal, 45). I'm not sure
you could display these without regenerating the underlying data,
which would be a lot of added complexity.

--
- Andrew Gray
andrew.gray@dunelm.org.uk

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Joseph Allemandou

Data Engineer @ Wikimedia Foundation

IRC: joal