On Mon, May 12, 2014 at 8:48 PM, Gilles Dubuc <gilles@wikimedia.org> wrote:
The ones I'm trying to improve are querying MediaViewer_8245578 (just grep for that on the page you've linked to, there will be some things I ran manually today, but the big queries are the problematic ones). The existing queries are full table scans and seem to take around 15 minutes. We run 27 of these every day one after the other. The index I want to add will actually help me break those queries down into several smaller queries. There's no point trying to break them down right now, because I expect that without indexes the simpler queries will take just as long because they'll be full table scans as well.Which ones are your tsvs stuff?
Ok, think I see the right ones.
If you want to avoid breaking the queries up, push the conditions down onto each base table of the union:
SELECT
...
FROM (
SELECT timestamp, wiki, event_action FROM MediaViewer_7670440
WHERE wiki = 'cawiki'
AND timestamp < TIMESTAMP(CURDATE())
AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
UNION ALL
SELECT timestamp, wiki, event_action FROM MediaViewer_8245578
WHERE wiki = 'cawiki'
AND timestamp < TIMESTAMP(CURDATE())
AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
) AS MediaViewerUnioned
GROUP BY datestring ORDER BY datestring ASC;That will allow a range access on the base tables' timestamp indexes which should reduce the row count a bit. Then if we add an index on (wiki, timestamp), which I'll kick off now, it drops the execution time to ~1min.
UNION ALL avoids the sorting overhead of UNION DISTINCT but unfortunately still materializes the entire set into a temporary table. Pushing the conditions down makes the query more verbose, but perhaps that's still easier than maintaining multiple queries and emulating union.
Sean
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics