Just to be clear:
I'm interested in identifying the expected growth bounds rather than limiting tables arbitrarily.
If someone knows X months or years of data is required for certain tables, feel free to speak up and Ops will ensure necessary storage capacity is planned in time.
On Thu, Jul 3, 2014 at 2:57 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I have the feeling there’s no need to keep 114Gb of raw client-side instrumentation data for tofu detection. Copying Amir, Gilles and Jon who are the respective owners of the schemas in Sean’s list.
On Jul 2, 2014, at 7:44 PM, Oliver Keyes okeyes@wikimedia.org wrote:
he odd name is frustrating to me too :/. I'd be interested to see if we need the MV tables (or, the really old data in them): as I understand it those are aggregated for public consumption fairly regularly.
On 2 July 2014 22:21, Sean Pringle springle@wikimedia.org wrote:
Hi :)
The following table is easily the largest in eventlogging and growing fastest:
114G UniversalLanguageSelector-tofu_7629564
Is there a plan for purging old data from this one? I realize it's mostly new data; just wondering if growth will be unbounded.
Why does it have an odd name "-tofu"? Is it intended?
There is a duplicate table called UniversalLanguageSelecTor-tofu_7629564 -- note the uppercase T -- with a single row. Is that needed?
The next biggest are:
67G PageContentSaveComplete_5588433.ibd 61G MediaViewer_8572637.ibd 57G MediaViewer_8245578.ibd 33G MobileWebClickTracking_5929948.ibd
BR Sean
DBA @ WMF
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics