Has anyone taken responsibility for fixing this?


On Sun, May 4, 2014 at 5:58 AM, Sean Pringle <springle@wikimedia.org> wrote:
Hi All

The query form listed inline below runs periodically on s1-analytics-slave and now analytics-store. It generates 30 days worth of data without gaps by joining a table known to have more than 30 rows.

That old trick is perfectly OK. However the table chosen is information_schema.columns and that's NOT OK :-)

The problem is that the metadata in some information_schema tables must be materialized every time they're accessed. In this case the .frm file for *every table in the system* must be opened and checked before the query runs[1]. Yes, every table.

This was probably always slow on s1-analytics-slave with enwiki + log + personal tables. It's even slower now that analytics-store holds all wikis.

SELECT
    Month.Date,
    COALESCE(Web.Web, 0) AS Web
FROM
    (SELECT
        DATE_FORMAT(ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY, @num:=@num+1), '%Y-%m-%d') AS Date
        FROM information_schema.columns, (SELECT @num:=-1) num LIMIT 30) AS Month
LEFT JOIN
    (SELECT DATE(timestamp) AS Date, SUM(1) AS Web FROM (
        SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount
            FROM MobileWebEditing_5644223
        UNION
        SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount
            FROM MobileWebEditing_6077315
        UNION
        SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount
            from MobileWebEditing_6637866
        UNION
        SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount
            from MobileWebEditing_7675117
        ) as MobileWebEditing
    WHERE
        event_action = 'error'
        AND wiki != 'testwiki'
    GROUP BY Date
) AS Web
ON Month.Date = Web.Date;

MariaDB 10 has an SQL trick for generating sequences. Hackish, but simple:

SELECT seq FROM seq_1_to_5;

+-----+
| seq |
+-----+
|   1 |
|   2 |
|   3 |
|   4 |
|   5 |
+-----+

https://mariadb.com/kb/en/sequence/

Alternatively, we can put a real sequence table somewhere handy with, say, 1000 integers.

BR
Sean

[1] The file accesses can be alleviated by having large table [definition] cache(s), however then we're talking hundreds of thousands more open file handles which is a whole new ceiling waiting to be hit :-)

--
DBA @ WMF

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics