On Sat, Dec 30, 2017 at 1:00 PM, Huji Lee <huji.huji(a)gmail.com> wrote:
The query tries to use indexes, as much as I could
think of
You missed using effective indexes.
Let's start by getting the EXPLAIN output for your query.
https://tools.wmflabs.org/tools-info/optimizer.py? doesn't seem to like the
subquery for some reason, but doing it manually by running the query on one
connection and doing SHOW EXPLAIN FROM on another tells us
+------+--------------------+-------------+--------+------------------------------+----------------+---------+------------+----------+------------------------------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra
|
+------+--------------------+-------------+--------+------------------------------+----------------+---------+------------+----------+------------------------------------+
| 1 | PRIMARY | logging | ALL |
type_time,times,type_action | NULL | NULL | NULL |
10809607 | Using where |
| 1 | PRIMARY | page | eq_ref | PRIMARY
| PRIMARY | 4 | func | 1 | Using
where |
| 1 | PRIMARY | user_groups | eq_ref | PRIMARY,ug_group
| PRIMARY | 261 | func,const | 1 | Using
where; Using index |
| 2 | DEPENDENT SUBQUERY | revision | ref |
rev_timestamp,user_timestamp | user_timestamp | 4 | func |
46 | Using index condition; Using where |
+------+--------------------+-------------+--------+------------------------------+----------------+---------+------------+----------+------------------------------------+
That "ALL" on the first line isn't very encouraging, nor is the estimate
that it's going to have to touch 10 million rows.
Your WHERE clause is selecting on log_action and log_timestamp, for which
there isn't an index in
https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenanc….
You probably meant to use log_type rather than log_action. Making that
change gives a much nicer looking plan:
+------+--------------------+-------------+--------+------------------------------+----------------+---------+------------+-------+------------------------------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra
|
+------+--------------------+-------------+--------+------------------------------+----------------+---------+------------+-------+------------------------------------+
| 1 | PRIMARY | logging | range |
type_time,times,type_action | type_time | 50 | NULL |
76090 | Using index condition |
| 1 | PRIMARY | user_groups | eq_ref | PRIMARY,ug_group
| PRIMARY | 261 | func,const | 1 | Using
where; Using index |
| 1 | PRIMARY | page | eq_ref | PRIMARY
| PRIMARY | 4 | func | 1 | Using
where |
| 2 | DEPENDENT SUBQUERY | revision | ref |
rev_timestamp,user_timestamp | user_timestamp | 4 | func |
46 | Using index condition; Using where |
+------+--------------------+-------------+--------+------------------------------+----------------+---------+------------+-------+------------------------------------+
It's still going to be slow, though, just because it still has to touch and
return tens of thousands of rows. And that slow subquery isn't going to
help matters either, since it's having to do it individually for each row
in the result set. You might wind up having to do the main query alone and
then do the subquery in batches to merge in your client code.
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation