Sean, fully agreed this is a problem. The way I see it, I don't think there's an easy solution. This query is made with a template that fills in different versions of an Event Logging schema. This is bad for more than just this performance reason:
* when a new schema revision is implemented, either dozens of queries have to be updated or we have to live with this crappy performance * when an old schema revision table is deleted, queries break regardless of approach * when querying ad-hoc, this is yet another stumbling block to getting at the data
We are currently focused on moving Event Logging to Kafka, and I think that the way we handle Event Logging analytics will change too much too soon for us to address this. If it's a problem for the database, I'd rather just disable the queries for now. Let me know and I'll make a judgement call to either go through and fix them all or disable them all.
On Wed, Jun 24, 2015 at 8:43 AM, Sean Pringle springle@wikimedia.org wrote:
Below are a couple examples of recent slow queries running on analytics-store. They both use subqueries and UNION. Neither push the WHERE conditions down to the subqueries, which means indexes are not used. If you include WHERE clauses in both inner and outer queries you will achieve index usage, especially for these sorts of eventlogging queries on event_action or wiki.
MySQL / MariaDB query optimizer does do constant propagation, but UNION tends to trip it up in all sorts of exciting ways. Be verbose, abandon DRY principle, and make your life easier :-)
Current query:
SELECT Month.Date, COALESCE(Web.Web, 0) AS Web FROM ( SELECT DATE_FORMAT( ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY, @num:=@num+1), '%Y-%m-%d' ) AS Date FROM seq_1_to_100, (SELECT @num:=-1) num LIMIT 30 ) AS Month LEFT JOIN ( SELECT DATE(timestamp) AS Date, SUM(1) AS Web FROM ( SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_5644223 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_6077315 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_6637866 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_7675117 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_8599025 ) as MobileWebEditing WHERE event_namespace != 0 AND event_action = 'success' AND wiki != 'testwiki' GROUP BY Date ) AS Web ON Month.Date = Web.Date;
Version with WHERE clauses pushed down to subqueries:
SELECT Month.Date, COALESCE(Web.Web, 0) AS Web FROM ( SELECT DATE_FORMAT( ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY, @num:=@num+1), '%Y-%m-%d' ) AS Date FROM seq_1_to_100, (SELECT @num:=-1) num LIMIT 30 ) AS Month LEFT JOIN ( SELECT DATE(timestamp) AS Date, SUM(1) AS Web FROM ( SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_5644223 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_6077315 WHERE event_namespace != 0 AND event_action = 'success' AND wiki != 'testwiki' UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_6637866 WHERE event_namespace != 0 AND event_action = 'success' AND wiki != 'testwiki' UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_7675117 WHERE event_namespace != 0 AND event_action = 'success' AND wiki != 'testwiki' UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_8599025 WHERE event_namespace != 0 AND event_action = 'success' AND wiki != 'testwiki' ) as MobileWebEditing WHERE event_namespace != 0 AND event_action = 'success' AND wiki != 'testwiki' GROUP BY Date ) AS Web ON Month.Date = Web.Date;
Current query (hours):
SELECT day, `Attempted saves` / `Loads` AS "All wikis", `Attempted saves - dewiki` / `Loads - dewiki` AS "dewiki", `Attempted saves - enwiki` / `Loads - enwiki` AS "enwiki", `Attempted saves - eswiki` / `Loads - eswiki` AS "eswiki", `Attempted saves - frwiki` / `Loads - frwiki` AS "frwiki", `Attempted saves - ptwiki` / `Loads - plwiki` AS "plwiki", `Attempted saves - plwiki` / `Loads - ptwiki` AS "ptwiki" FROM ( SELECT DATE( timestamp ) AS day, SUM( IF( event_action = 'ready', 1, 0 ) ) AS "Loads", SUM( IF( event_action = 'ready' AND wiki = 'dewiki', 1, 0 ) ) AS "Loads - dewiki", SUM( IF( event_action = 'ready' AND wiki = 'enwiki', 1, 0 ) ) AS "Loads - enwiki", SUM( IF( event_action = 'ready' AND wiki = 'eswiki', 1, 0 ) ) AS "Loads - eswiki", SUM( IF( event_action = 'ready' AND wiki = 'frwiki', 1, 0 ) ) AS "Loads - frwiki", SUM( IF( event_action = 'ready' AND wiki = 'plwiki', 1, 0 ) ) AS "Loads - plwiki", SUM( IF( event_action = 'ready' AND wiki = 'ptwiki', 1, 0 ) ) AS "Loads - ptwiki", SUM( IF( event_action = 'saveAttempt', 1, 0 ) ) AS "Attempted saves", SUM( IF( event_action = 'saveAttempt' AND wiki = 'dewiki', 1, 0 ) ) AS "Attempted saves - dewiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'enwiki', 1, 0 ) ) AS "Attempted saves - enwiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'eswiki', 1, 0 ) ) AS "Attempted saves - eswiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'frwiki', 1, 0 ) ) AS "Attempted saves - frwiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'plwiki', 1, 0 ) ) AS "Attempted saves - plwiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'ptwiki', 1, 0 ) ) AS "Attempted saves - ptwiki" FROM ( SELECT event_action, wiki, timestamp FROM Edit_10604157 UNION ALL SELECT event_action, wiki, timestamp FROM Edit_10676603 ) AS valid_edit_schema_revisions GROUP BY day ORDER BY day ) as internalQuery;
Version with WHERE clauses pushed down to subqueries:
SELECT day, `Attempted saves` / `Loads` AS "All wikis", `Attempted saves - dewiki` / `Loads - dewiki` AS "dewiki", `Attempted saves - enwiki` / `Loads - enwiki` AS "enwiki", `Attempted saves - eswiki` / `Loads - eswiki` AS "eswiki", `Attempted saves - frwiki` / `Loads - frwiki` AS "frwiki", `Attempted saves - ptwiki` / `Loads - plwiki` AS "plwiki", `Attempted saves - plwiki` / `Loads - ptwiki` AS "ptwiki" FROM ( SELECT DATE( timestamp ) AS day, SUM( IF( event_action = 'ready', 1, 0 ) ) AS "Loads", SUM( IF( event_action = 'ready' AND wiki = 'dewiki', 1, 0 ) ) AS "Loads - dewiki", SUM( IF( event_action = 'ready' AND wiki = 'enwiki', 1, 0 ) ) AS "Loads - enwiki", SUM( IF( event_action = 'ready' AND wiki = 'eswiki', 1, 0 ) ) AS "Loads - eswiki", SUM( IF( event_action = 'ready' AND wiki = 'frwiki', 1, 0 ) ) AS "Loads - frwiki", SUM( IF( event_action = 'ready' AND wiki = 'plwiki', 1, 0 ) ) AS "Loads - plwiki", SUM( IF( event_action = 'ready' AND wiki = 'ptwiki', 1, 0 ) ) AS "Loads - ptwiki", SUM( IF( event_action = 'saveAttempt', 1, 0 ) ) AS "Attempted saves", SUM( IF( event_action = 'saveAttempt' AND wiki = 'dewiki', 1, 0 ) ) AS "Attempted saves - dewiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'enwiki', 1, 0 ) ) AS "Attempted saves - enwiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'eswiki', 1, 0 ) ) AS "Attempted saves - eswiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'frwiki', 1, 0 ) ) AS "Attempted saves - frwiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'plwiki', 1, 0 ) ) AS "Attempted saves - plwiki", SUM( IF( event_action = 'saveAttempt' AND wiki = 'ptwiki', 1, 0 ) ) AS "Attempted saves - ptwiki" FROM ( SELECT event_action, wiki, timestamp FROM Edit_10604157 WHERE event_action in ('saveAttempt', 'ready') UNION ALL SELECT event_action, wiki, timestamp FROM Edit_10676603 WHERE event_action in ('saveAttempt', 'ready') ) AS valid_edit_schema_revisions GROUP BY day ORDER BY day ) as internalQuery;
-- DBA @ WMF
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics