Nuria, I believe that Dario already did that[1].
1.
I see, I thought concern was privacy rather than
capacity. In that case we
should put in our backlog an item to short out schemas and find the ones
whose data can be deleted. I will file an item to this extent.
In the future we hopefully have this metadata about the schema available
somewhere.
On May 30, 2014, at 8:03 AM, Sean Pringle <springle(a)wikimedia.org> wrote:
On Fri, May 30, 2014 at 3:28 PM, Ori Livneh <ori(a)wikimedia.org> wrote:
On Wed, May 28, 2014 at 11:26 PM, Steven Walling
<swalling(a)wikimedia.org>
wrote:
My main question is what the rationale is. Is it
to improve query
performance on analytics dbs?
I imagine it will help, but it's probably not the primary reason. I
imagine Sean would like to have the database in a state of equilibrium such
that there are no looming dangers, and no reason in principle why things
couldn't just keep running. At the moment the clip of incoming events is
prone to sharp fluctuations and there is no protocol in place for handling
exhausted server capacity.
Correct.
It's not really about performance since the dataset will be larger than
$memory regardless.
Of course, if you guys decide that specific data needs to stay around for
ever, that's fine; it helps with capacity planning and we just bite the
bullet and ensure sufficient storage space is available. Having a default
purge-after-X-months policy for new tables would be the baseline.
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics