Sean,
First, thanks for killing those queries.
Second, I've cc-ed mobile tech to talk about this problem. This query is running periodically on a scheduler. It is re-computing all data every time it runs. The mobile folks developed logic in the scheduler to time-box some queries, but problems with Event Logging data and the incredibly frequent need to re-run after data outages and backfills has left this kind of monster query as the only sane option. So our current reality is that the only sane option is insane.
Indexing these tables to make this insane query run is like the old "Why don't you turn in your crazy brother who thinks he's a chicken? I would, but I need the eggs." joke. Yes we do need the eggs, but no we're not going to get them from this query.
Marcel wrote a new scheduler that handles failures much better and is much easier to work with in terms of re-runs. That's the only option to keep this kind of data generation alive. Until we get that running, we are going to cancel this and other non-timeboxed queries. This patch: https://gerrit.wikimedia.org/r/#/c/210364/ has mobile folks on it and Sean. Sean, if this problem happens again before we take action, just merge this patch and that will stop the bleeding.
In the long run, we are trying to migrate this data to a scalable analytics platform where you can run queries like this and the system will cleverly re-compute only if the underlying data changes. Druid, Pipeline DB, etc. are candidates. The intermediate step to that will be moving the Event Logging pipeline to Kafka and Hadoop, which we are in the process of doing right now.
On Tue, May 12, 2015 at 2:12 AM, Sean Pringle springle@wikimedia.org wrote:
I just killed 100+ 3-day unindexed research queries on dbstore1002. All replication streams were lagging by nearly 1 day, and /tmp was hundreds of GB.
This seems similar to the problem from ~2 weeks ago, when old /tmp did fill up. The queries were of the following form (but with some variation). We need some indexing scheme for MobileWebEditing* tables, or to come up with a new approach.
SELECT Month.Date, COALESCE(Web.Web, 0) AS Web
-- http://stackoverflow.com/a/6871220/365238 -- ... using MariaDB 10 SEQUENCE engine instead of information_schema.columns FROM ( SELECT DATE_FORMAT( ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY, @num:=@num+1), '%Y-%m-%d' ) AS Date FROM seq_1_to_100, (SELECT @num:=-1) num LIMIT 30 ) AS Month LEFT JOIN ( SELECT DATE(timestamp) AS Date, SUM(1) AS Web
FROM (SELECT timestamp, wiki, event_username, event_action,
event_namespace, event_userEditCount FROM MobileWebEditing_5644223 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_6077315 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_6637866 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_7675117 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_8599025) as MobileWebEditing WHERE event_action = 'error' AND wiki != 'testwiki' GROUP BY Date ) AS Web ON Month.Date = Web.Date;
EXPLAIN:
+------+--------------+--------------------------+--------+---------------+---------+---------+------------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+--------------------------+--------+---------------+---------+---------+------------+----------+----------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 30 | | | 1 | PRIMARY | <derived4> | ref | key0 | key0 | 4 | Month.Date | 563154 | Using where | | 4 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 56315405 | Using where; Using temporary; Using filesort | | 5 | DERIVED | MobileWebEditing_5644223 | ALL | NULL | NULL | NULL | NULL | 1152600 | | | 6 | UNION | MobileWebEditing_6077315 | ALL | NULL | NULL | NULL | NULL | 685212 | | | 7 | UNION | MobileWebEditing_6637866 | ALL | NULL | NULL | NULL | NULL | 1528269 | | | 8 | UNION | MobileWebEditing_7675117 | ALL | NULL | NULL | NULL | NULL | 1663281 | | | 9 | UNION | MobileWebEditing_8599025 | ALL | NULL | NULL | NULL | NULL | 51286043 | | | NULL | UNION RESULT | <union5,6,7,8,9> | ALL | NULL | NULL | NULL | NULL | NULL | | | 2 | DERIVED | <derived3> | system | NULL | NULL | NULL | NULL | 1 | | | 2 | DERIVED | seq_1_to_100 | index | NULL | PRIMARY | 8 | NULL | 100 | Using index | | 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------+--------------------------+--------+---------------+---------+---------+------------+----------+----------------------------------------------+
DBA @ WMF
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics