I see, I thought concern was privacy rather than capacity. In that case we should put in
our backlog an item to short out schemas and find the ones whose data can be deleted. I
will file an item to this extent.
In the future we hopefully have this metadata about the schema available somewhere.
On May 30, 2014, at 8:03 AM, Sean Pringle <springle(a)wikimedia.org> wrote:
On Fri, May 30, 2014 at 3:28 PM, Ori Livneh
On Wed, May 28, 2014 at 11:26 PM, Steven Walling
My main question is what the rationale is. Is it
to improve query performance on analytics dbs?
I imagine it will help, but it's probably not the primary reason. I imagine Sean
would like to have the database in a state of equilibrium such that there are no looming
dangers, and no reason in principle why things couldn't just keep running. At the
moment the clip of incoming events is prone to sharp fluctuations and there is no protocol
in place for handling exhausted server capacity.
It's not really about performance since the dataset will be larger than $memory
Of course, if you guys decide that specific data needs to stay around for ever,
that's fine; it helps with capacity planning and we just bite the bullet and ensure
sufficient storage space is available. Having a default purge-after-X-months policy for
new tables would be the baseline.
Analytics mailing list