Hi folks, Back in January, the size of MobileWebClickTracking had gotten to be over 200 gb, making it so slow as to be unusable. As a result, we split up the into 3 separate tables.
However, it seems that >90% of the clicks are coming from the article table (or adding search created bloat) and MobileWebUIClickTracking_10742159 is now approaching 300gb. Mostly this is due to search. I would encourage further sampling, but that would mean that beta data would be lost. Perhaps we can split it into separate beta/stable tables and then sample stable? Any other ideas?
Phab ticket here: https://phabricator.wikimedia.org/T108723
-J
On Tue, Aug 11, 2015 at 12:29 PM, Jon Katz jkatz@wikimedia.org wrote:
However, it seems that >90% of the clicks are coming from the article table (or adding search created bloat) and MobileWebUIClickTracking_10742159 is now approaching 300gb. Mostly this is due to search. I would encourage further sampling, but that would mean that beta data would be lost. Perhaps we can split it into separate beta/stable tables and then sample stable? Any other ideas?
Add a samplingRatio field to the schema, add a PHP global to control sampling ratio, set it via operations/mediawiki-config appropriately for each site, in the SQL query used for the dashboards replace count(*) with sum(event_samplingRatio). We did that for MediaViewer and it worked great.
Also if your main concern is table size (for us it was mainly server load), you can just run a script periodically to replace the user agent and the URL with an empty string. Those probably take up most of the storage space, every other field is fairly short.
On Tue, Aug 11, 2015 at 2:59 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gergo Tisza, 11/08/2015 22:44:
replace the user agent and the URL with an empty string.
Is there still no way to avoid storing those fields altogether? They are both nasty.
User agent is a default field and cannot be removed. I opened T108757 https://phabricator.wikimedia.org/T108757 about that. URL is specific to this schema and presumably there for a reason.