If – like me – you had jobs or queries on the analytics slaves that used to finish in
seconds or minutes and suddenly started taking several minutes to hours to complete, you
were probably affected by a problem with the table types used on the slaves.
Sean nailed down the issue which has now been temporarily solved by restoring
recentchanges tables as InnoDB. Post mortem below (thanks Sean!)
Dario
Aside from
tuning the SQL, I think the slow down you saw was at least partly due to the migration of
tables from InnoDB to TokuDB. The combination of table scan on either change_tag or
tag_summary plus an eq_ref join on recentchanges hits a performance wall somewhere around
3M rows. It isn't due to obvious things like memory, swap, mutexes, or temp tables
hitting disk. So I guess I need to search the upstream bugs list :-)
Found it:
https://mariadb.atlassian.net/browse/MDEV-5595 -- Related specifically
to slow-downs on tables with high delete rates, precisely like our recentchanges :-)
Will take a couple days to organize packages with the relevant fix, so for now I've
reverted recentchanges to InnoDB on analytics boxes. No other tables or queries seem
affected.