On Tue, Aug 11, 2015 at 12:29 PM, Jon Katz <jkatz@wikimedia.org> wrote:
However, it seems that >90% of the clicks are coming from the article table (or adding search created bloat) and MobileWebUIClickTracking_10742159 is now approaching 300gb.  Mostly this is due to search. I would encourage further sampling, but that would mean that beta data would be lost.  Perhaps we can split it into separate beta/stable tables and then sample stable? Any other ideas?

Add a samplingRatio field to the schema, add a PHP global to control sampling ratio, set it via operations/mediawiki-config appropriately for each site, in the SQL query used for the dashboards replace count(*) with sum(event_samplingRatio). We did that for MediaViewer and it worked great.

Also if your main concern is table size (for us it was mainly server load), you can just run a script periodically to replace the user agent and the URL with an empty string. Those probably take up most of the storage space, every other field is fairly short.