+1 to Dario's mention of the many schemas that just capture production DB stuff in a better way.  

Re. growth: Old growth experiment schemas continue to be a great resource for checking old work and sometimes even new hypotheses.  When Dario and Kevin get around to us, I'll have a complete list of schemas that should not be purged.

Re. storage parameters in the Schema, I agree with Ori, but I'd still like to have them on the wiki somehow.  If we were a bunch of Wikipedia editors, I'd suggest making a template for the talk page of a schema that captures this metadata.  Given that a template would probably not be best and we'd probably like to stick to JSON, maybe a subpage would be in order.

Such a pattern would allow for changes to storage restrictions without changing the rev_id of the schema page (data type).  


On Thu, May 29, 2014 at 1:26 AM, Steven Walling <swalling@wikimedia.org> wrote:

On Wed, May 28, 2014 at 10:50 AM, Dan Andreescu <dandreescu@wikimedia.org> wrote:
I just announced this potential change in Scrum of Scrums and the Mobile team said they also would like to keep old data, but not for all of their schemas.  They're cleaning up their graphs and we should check with them when we start deleting.

Following up on this from the Growth perspective...

My main question is what the rationale is. Is it to improve query performance on analytics dbs?

I do know there are many older schemas for Growth-related experiments that are only really useful for historical analysis, which is kind of hard to reconstruct anyway. If there are sound technical reasons to chuck stuff from the relational dbs and retain it only in the raw JSON logs, then I'm potentially okay with helping figure out a list of schemas to retain and schemas to purge. Aaron, thoughts?

Steven Walling,
Product Manager

Analytics mailing list