If – like me – you had jobs or queries on the analytics slaves that used to finish in seconds or minutes and suddenly started taking several minutes to hours to complete, you were probably affected by a problem with the table types used on the slaves.
Sean nailed down the issue which has now been temporarily solved by restoring recentchanges tables as InnoDB. Post mortem below (thanks Sean!)
Dario
Aside from tuning the SQL, I think the slow down you saw was at least partly due to the migration of tables from InnoDB to TokuDB. The combination of table scan on either change_tag or tag_summary plus an eq_ref join on recentchanges hits a performance wall somewhere around 3M rows. It isn't due to obvious things like memory, swap, mutexes, or temp tables hitting disk. So I guess I need to search the upstream bugs list :-)
Found it: https://mariadb.atlassian.net/browse/MDEV-5595 -- Related specifically to slow-downs on tables with high delete rates, precisely like our recentchanges :-)
Will take a couple days to organize packages with the relevant fix, so for now I've reverted recentchanges to InnoDB on analytics boxes. No other tables or queries seem affected.