Re: [Analytics] The Uniﬁed Logging Infrastructure for Data Analytics at Twitter

22 Jan 2013


      Yes! We've talked a bit about this paper when thinking about the structure of our data storage and processing. To me the path Twitter followed seems very reasonable, so it's encouraging to hear that it looks that way to someone who gets dirty with data on a daily basis.
As it stands now, we weren't planning on enforcing any schema requirements in Kraken, but it'd be interesting to experiment with a standardized event-data format if y'all were in favor of it. Our most recent pass at a schema[1] -- mostly for binary serialization, to save bits -- has an otherwise-untyped (String-String) map for the KV pairs of the data payload. We intended to use an additional, optional field to permit specifying a sub-schema to apply strong typing to incoming event data. (We plan on storing things with Avro, but it's easy enough to convert between it and JSONSchema.) Event subclasses would be more flexible but require custom processing for each class. I'd normally oppose a standard model (Google doesn't use one internally, for example) but as Twitter made it work, I think it's worth exploring.
Thoughts?
[1] https://www.mediawiki.org/wiki/Analytics/Kraken/Data_Formats#Event_Data_Sche...
-- 
David Schoonover
dsc@wikimedia.org


On Thursday, 17 January 2013 at 2:00 p, Dario Taraborelli wrote:

> http://arxiv.org/pdf/1208.4171.pdf
> 
> This is a pretty interesting and accessible description of best practices and design decisions driven by practical problems they had to solve at Twitter in the area of client-side event logging, funnel analysis, user modeling. 
> E3: check out section "3.2 Client Events" in particular, which is quite relevant to EventLogging.
> 
> Dario
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org (mailto:Analytics@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
>

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] The Uniﬁed Logging Infrastructure for Data Analytics at Twitter