Hey, One important optimization you can use and it's often missed out (and it's going to be needed more as we normalize more tables) is join decomposition. It basically means you don't join and query but do two (or several) queries separately in your code. This might seem counter intuitive but it's pretty useful for several reasons (like MySQL will cache the normalizing table and answer faster or you yourself can cache some parts). You can read more about those in "High Performance MySQL" book.
HTH
On Mon, Jun 3, 2019 at 3:43 AM Jesse Plamondon-Willard < pathoschild@gmail.com> wrote:
Hello,
Some Toolforge SQL queries have much worse performance when updated for the new comment table.
For example, see one previous SQL query https://tools.wmflabs.org/sql-optimizer?use=enwiki_p&sql=SELECT%0D%0A++++++++++++++++++++user_id%2C%0D%0A++++++++++++++++++++user_registration%2C%0D%0A++++++++++++++++++++DATE_FORMAT%28user_registration%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+registration%2C%0D%0A++++++++++++++++++++user_editcount%2C%0D%0A++++++++++++++++++++GROUP_CONCAT%28ug_group+SEPARATOR+%22%2C+%22%29+AS+user_groups%2C%0D%0A++++++++++++++++++++ipb_by_text%2C%0D%0A++++++++++++++++++++%27%27+AS+ipb_reason%2C+--+column+deleted%0D%0A++++++++++++++++++++DATE_FORMAT%28ipb_timestamp%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+ipb_timestamp%2C%0D%0A++++++++++++++++++++ipb_deleted%2C%0D%0A++++++++++++++++++++COALESCE%28DATE_FORMAT%28ipb_expiry%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29%2C+ipb_expiry%29+AS+ipb_expiry%0D%0A++++++++++++++++FROM%0D%0A++++++++++++++++++++user%0D%0A++++++++++++++++++++LEFT+JOIN+user_groups+ON+user_id+%3D+ug_user%0D%0A++++++++++++++++++++LEFT+JOIN+ipblocks_ipindex+ON+user_id+%3D+ipb_user%0D%0A++++++++++++++++WHERE+user_name+%3D+%27Pathoschild%27%0D%0A++++++++++++++++LIMIT+1 and its updated version https://tools.wmflabs.org/sql-optimizer?use=enwiki_p&sql=SELECT%0D%0A++++++++++++++++++++user_id%2C%0D%0A++++++++++++++++++++user_registration%2C%0D%0A++++++++++++++++++++DATE_FORMAT%28user_registration%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+registration%2C%0D%0A++++++++++++++++++++user_editcount%2C%0D%0A++++++++++++++++++++GROUP_CONCAT%28ug_group+SEPARATOR+%22%2C+%22%29+AS+user_groups%2C%0D%0A++++++++++++++++++++ipb_by_text%2C%0D%0A++++++++++++++++++++comment_text+AS+ipb_reason%2C%0D%0A++++++++++++++++++++DATE_FORMAT%28ipb_timestamp%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+ipb_timestamp%2C%0D%0A++++++++++++++++++++ipb_deleted%2C%0D%0A++++++++++++++++++++COALESCE%28DATE_FORMAT%28ipb_expiry%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29%2C+ipb_expiry%29+AS+ipb_expiry%0D%0A++++++++++++++++FROM%0D%0A++++++++++++++++++++user%0D%0A++++++++++++++++++++LEFT+JOIN+user_groups+ON+user_id+%3D+ug_user%0D%0A++++++++++++++++++++LEFT+JOIN+ipblocks_ipindex+ON+user_id+%3D+ipb_user%0D%0A++++++++++++++++++++LEFT+JOIN+comment+ON+ipb_reason_id+%3D+comment_id%0D%0A++++++++++++++++WHERE+user_name+%3D+%27Pathoschild%27%0D%0A++++++++++++++++LIMIT+1 which just adds a join on the comment table. The change adds ten dependent subqueries and increases Stalktoy load times from ≈15 seconds to ≈245 seconds, and the same change for another query was enough to have my tools temporarily rate-limited https://phabricator.wikimedia.org/T217853 .
I haven't found a way to update those queries efficiently. Is there an optimisation I'm missing? Why does the comment view need subqueries on ten other tables, and is there an alternate version without the subqueries for cases where we just need to join by ID?
-- Jesse Plamondon-Willard (Pathoschild) _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud