Sean, fully agreed this is a problem. The way I see it, I don't think
there's an easy solution. This query is made with a template that fills in
different versions of an Event Logging schema. This is bad for more than
just this performance reason:
* when a new schema revision is implemented, either dozens of queries have
to be updated or we have to live with this crappy performance
* when an old schema revision table is deleted, queries break regardless of
approach
* when querying ad-hoc, this is yet another stumbling block to getting at
the data
We are currently focused on moving Event Logging to Kafka, and I think that
the way we handle Event Logging analytics will change too much too soon for
us to address this. If it's a problem for the database, I'd rather just
disable the queries for now. Let me know and I'll make a judgement call to
either go through and fix them all or disable them all.
On Wed, Jun 24, 2015 at 8:43 AM, Sean Pringle <springle(a)wikimedia.org>
wrote:
Below are a couple examples of recent slow queries
running on
analytics-store. They both use subqueries and UNION. Neither push the
WHERE conditions down to the subqueries, which means indexes are not
used. If you include WHERE clauses in both inner and outer queries you
will achieve index usage, especially for these sorts of eventlogging
queries on event_action or wiki.
MySQL / MariaDB query optimizer does do constant propagation, but
UNION tends to trip it up in all sorts of exciting ways. Be verbose,
abandon DRY principle, and make your life easier :-)
Current query:
SELECT Month.Date, COALESCE(Web.Web, 0) AS Web FROM (
SELECT DATE_FORMAT( ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY,
@num:=@num+1), '%Y-%m-%d' ) AS Date
FROM seq_1_to_100, (SELECT @num:=-1) num LIMIT 30 ) AS Month LEFT
JOIN (
SELECT DATE(timestamp) AS Date,
SUM(1) AS Web FROM (
SELECT timestamp, wiki, event_username, event_action,
event_namespace, event_userEditCount FROM MobileWebEditing_5644223
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
FROM MobileWebEditing_6077315
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
from MobileWebEditing_6637866
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
from MobileWebEditing_7675117
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
from MobileWebEditing_8599025
) as MobileWebEditing
WHERE event_namespace != 0 AND event_action = 'success' AND wiki
!= 'testwiki'
GROUP BY Date
) AS Web ON Month.Date = Web.Date;
Version with WHERE clauses pushed down to subqueries:
SELECT Month.Date, COALESCE(Web.Web, 0) AS Web FROM (
SELECT DATE_FORMAT( ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY,
@num:=@num+1), '%Y-%m-%d' ) AS Date
FROM seq_1_to_100, (SELECT @num:=-1) num LIMIT 30 ) AS Month LEFT
JOIN (
SELECT DATE(timestamp) AS Date,
SUM(1) AS Web FROM (
SELECT timestamp, wiki, event_username, event_action,
event_namespace, event_userEditCount FROM MobileWebEditing_5644223
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
FROM MobileWebEditing_6077315 WHERE event_namespace !=
0 AND event_action = 'success' AND wiki != 'testwiki'
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
from MobileWebEditing_6637866 WHERE event_namespace !=
0 AND event_action = 'success' AND wiki != 'testwiki'
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
from MobileWebEditing_7675117 WHERE event_namespace !=
0 AND event_action = 'success' AND wiki != 'testwiki'
UNION SELECT timestamp, wiki, event_username,
event_action, event_namespace, event_userEditCount
from MobileWebEditing_8599025 WHERE event_namespace !=
0 AND event_action = 'success' AND wiki != 'testwiki'
) as MobileWebEditing
WHERE event_namespace != 0 AND event_action = 'success' AND wiki
!= 'testwiki'
GROUP BY Date
) AS Web ON Month.Date = Web.Date;
Current query (hours):
SELECT day,
`Attempted saves` / `Loads` AS "All wikis",
`Attempted saves - dewiki` / `Loads - dewiki` AS "dewiki",
`Attempted saves - enwiki` / `Loads - enwiki` AS "enwiki",
`Attempted saves - eswiki` / `Loads - eswiki` AS "eswiki",
`Attempted saves - frwiki` / `Loads - frwiki` AS "frwiki",
`Attempted saves - ptwiki` / `Loads - plwiki` AS "plwiki",
`Attempted saves - plwiki` / `Loads - ptwiki` AS "ptwiki" FROM (
SELECT DATE( timestamp ) AS day,
SUM( IF( event_action = 'ready', 1, 0 ) ) AS "Loads",
SUM( IF( event_action = 'ready' AND wiki = 'dewiki', 1, 0 ) ) AS
"Loads - dewiki",
SUM( IF( event_action = 'ready' AND wiki = 'enwiki', 1, 0 ) ) AS
"Loads - enwiki",
SUM( IF( event_action = 'ready' AND wiki = 'eswiki', 1, 0 ) ) AS
"Loads - eswiki",
SUM( IF( event_action = 'ready' AND wiki = 'frwiki', 1, 0 ) ) AS
"Loads - frwiki",
SUM( IF( event_action = 'ready' AND wiki = 'plwiki', 1, 0 ) ) AS
"Loads - plwiki",
SUM( IF( event_action = 'ready' AND wiki = 'ptwiki', 1, 0 ) ) AS
"Loads - ptwiki",
SUM( IF( event_action = 'saveAttempt', 1, 0 ) ) AS "Attempted
saves",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'dewiki', 1, 0 )
) AS "Attempted saves - dewiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'enwiki', 1, 0 )
) AS "Attempted saves - enwiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'eswiki', 1, 0 )
) AS "Attempted saves - eswiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'frwiki', 1, 0 )
) AS "Attempted saves - frwiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'plwiki', 1, 0 )
) AS "Attempted saves - plwiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'ptwiki', 1, 0 )
) AS "Attempted saves - ptwiki" FROM (
SELECT event_action, wiki, timestamp
FROM Edit_10604157
UNION ALL
SELECT event_action, wiki, timestamp
FROM Edit_10676603
) AS valid_edit_schema_revisions
GROUP BY day
ORDER BY day
) as internalQuery;
Version with WHERE clauses pushed down to subqueries:
SELECT day,
`Attempted saves` / `Loads` AS "All wikis",
`Attempted saves - dewiki` / `Loads - dewiki` AS "dewiki",
`Attempted saves - enwiki` / `Loads - enwiki` AS "enwiki",
`Attempted saves - eswiki` / `Loads - eswiki` AS "eswiki",
`Attempted saves - frwiki` / `Loads - frwiki` AS "frwiki",
`Attempted saves - ptwiki` / `Loads - plwiki` AS "plwiki",
`Attempted saves - plwiki` / `Loads - ptwiki` AS "ptwiki" FROM (
SELECT DATE( timestamp ) AS day,
SUM( IF( event_action = 'ready', 1, 0 ) ) AS "Loads",
SUM( IF( event_action = 'ready' AND wiki = 'dewiki', 1, 0 ) ) AS
"Loads - dewiki",
SUM( IF( event_action = 'ready' AND wiki = 'enwiki', 1, 0 ) ) AS
"Loads - enwiki",
SUM( IF( event_action = 'ready' AND wiki = 'eswiki', 1, 0 ) ) AS
"Loads - eswiki",
SUM( IF( event_action = 'ready' AND wiki = 'frwiki', 1, 0 ) ) AS
"Loads - frwiki",
SUM( IF( event_action = 'ready' AND wiki = 'plwiki', 1, 0 ) ) AS
"Loads - plwiki",
SUM( IF( event_action = 'ready' AND wiki = 'ptwiki', 1, 0 ) ) AS
"Loads - ptwiki",
SUM( IF( event_action = 'saveAttempt', 1, 0 ) ) AS "Attempted
saves",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'dewiki', 1, 0 )
) AS "Attempted saves - dewiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'enwiki', 1, 0 )
) AS "Attempted saves - enwiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'eswiki', 1, 0 )
) AS "Attempted saves - eswiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'frwiki', 1, 0 )
) AS "Attempted saves - frwiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'plwiki', 1, 0 )
) AS "Attempted saves - plwiki",
SUM( IF( event_action = 'saveAttempt' AND wiki = 'ptwiki', 1, 0 )
) AS "Attempted saves - ptwiki" FROM (
SELECT event_action, wiki, timestamp
FROM Edit_10604157
WHERE event_action in ('saveAttempt', 'ready')
UNION ALL
SELECT event_action, wiki, timestamp
FROM Edit_10676603
WHERE event_action in ('saveAttempt', 'ready')
) AS valid_edit_schema_revisions
GROUP BY day
ORDER BY day
) as internalQuery;
--
DBA @ WMF
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics