Just to be clear:

I'm interested in identifying the expected growth bounds rather than limiting tables arbitrarily.

If someone knows X months or years of data is required for certain tables, feel free to speak up and Ops will ensure necessary storage capacity is planned in time.

On Thu, Jul 3, 2014 at 2:57 PM, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:

I have the feeling there’s no need to keep 114Gb of raw client-side instrumentation data for tofu detection.
Copying Amir, Gilles and Jon who are the respective owners of the schemas in Sean’s list.

On Jul 2, 2014, at 7:44 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:

he odd name is frustrating to me too :/. I'd be interested to see if we need the MV tables (or, the really old data in them): as I understand it those are aggregated for public consumption fairly regularly.

On 2 July 2014 22:21, Sean Pringle <springle@wikimedia.org> wrote:

Hi :)

The following table is easily the largest in eventlogging and growing fastest:

114G     UniversalLanguageSelector-tofu_7629564

Is there a plan for purging old data from this one? I realize it's mostly new data; just wondering if growth will be unbounded.

Why does it have an odd name "-tofu"? Is it intended?

There is a duplicate table called UniversalLanguageSelecTor-tofu_7629564 -- note the uppercase T -- with a single row. Is that needed?

The next biggest are:

67G     PageContentSaveComplete_5588433.ibd
61G     MediaViewer_8572637.ibd
57G     MediaViewer_8245578.ibd
33G     MobileWebClickTracking_5929948.ibd

BR
Sean

---
DBA @ WMF

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

DBA @ WMF