Has anyone taken responsibility for fixing this?
On Sun, May 4, 2014 at 5:58 AM, Sean Pringle springle@wikimedia.org wrote:
Hi All
The query form listed inline below runs periodically on s1-analytics-slave and now analytics-store. It generates 30 days worth of data without gaps by joining a table known to have more than 30 rows.
That old trick is perfectly OK. However the table chosen is information_schema.columns and that's NOT OK :-)
The problem is that the metadata in some information_schema tables must be materialized every time they're accessed. In this case the .frm file for *every table in the system* must be opened and checked before the query runs[1]. Yes, every table.
This was probably always slow on s1-analytics-slave with enwiki + log + personal tables. It's even slower now that analytics-store holds all wikis.
SELECT Month.Date, COALESCE(Web.Web, 0) AS Web FROM (SELECT DATE_FORMAT(ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY, @num:=@num+1), '%Y-%m-%d') AS Date FROM information_schema.columns, (SELECT @num:=-1) num LIMIT 30) AS Month LEFT JOIN (SELECT DATE(timestamp) AS Date, SUM(1) AS Web FROM ( SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_5644223 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_6077315 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_6637866 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_7675117 ) as MobileWebEditing WHERE event_action = 'error' AND wiki != 'testwiki' GROUP BY Date ) AS Web ON Month.Date = Web.Date;
MariaDB 10 has an SQL trick for generating sequences. Hackish, but simple:
SELECT seq FROM seq_1_to_5;
+-----+ | seq | +-----+ | 1 | | 2 | | 3 | | 4 | | 5 | +-----+
https://mariadb.com/kb/en/sequence/
Alternatively, we can put a real sequence table somewhere handy with, say, 1000 integers.
BR Sean
[1] The file accesses can be alleviated by having large table [definition] cache(s), however then we're talking hundreds of thousands more open file handles which is a whole new ceiling waiting to be hit :-)
-- DBA @ WMF
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics