Alright, now that I've unearthed the page I was looking for, a fuller answer. The wiki replicas have to do a lot of database magic to remove private/deleted data, which means that the normal tables can be slow.
For your logging query, logging_logindex makes things faster because log_action is nulled for deleted log entries in logging and logging_userindex. In logging_logindex, rows with a deleted action are removed entirely: you lose the data for deleted log entries at the benefit of indexes on log_action. That's not a concern with this query because you're filtering on log action anyway.
The actor view on the replicas has to do even more database magic and perform subqueries on 8 other tables, so it's almost always going to slow things down. If you use actor_logging instead of actor, you skip 7 of those subqueries.
More information: https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Alternative_Views https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_R... https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Tables_for_revis... https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/productio... (configuration for the views on the replicas)
As for why these queries were faster a few months ago, that's probably related to database server issues (https://phabricator.wikimedia.org/T246970 and https://phabricator.wikimedia.org/T247978).
Using the optimized views brings the all-time query down to about 15 minutes (https://quarry.wmflabs.org/query/43984) and the 2019 query down to about 106 seconds (https://quarry.wmflabs.org/query/43985).
AntiCompositeNumber
On Wed, Apr 15, 2020 at 18:13 Maciej Jaros egil@wp.pl wrote:
Also note that I have a similar query that kind of works but takes much longer to complete then it used to. https://quarry.wmflabs.org/query/41680
Executed in 46.22 seconds as of Wed Jan 29 2020. Resultset (1480 rows)
Executed in 188.32 seconds as of Wed Apr 15 2020. Resultset (1480 rows)
So thats about 4 times longer then before on the same data. Well actor table is probably longer, but I assume this will use indexes: `LEFT JOIN actor a ON actor_id = log_actor`.
Cheers, Nux.
Maciej Jaros (2020-04-15 23:56):
Not sure if this is an appropriate place to discuss this but I'm not able to run this query: https://quarry.wmflabs.org/query/24267
Even a minor part of this query like below needs a significant time to complete. SELECT count(*) as review_count, log_actor FROM logging WHERE log_type = 'review' AND log_action = 'approve' GROUP BY log_actor
I tried with logging_userindex but that don't seem to help. Any chances on adding extra indexes to make such queries work? I think querying log_type and log_actor should be quite common for various user stats.
Even this takes almost 200 seconds to complete: SELECT count(*) as review_count, log_actor FROM logging_userindex WHERE log_type = 'review' AND log_action = 'approve' AND log_timestamp >= 20190101000000 AND log_timestamp <= 20190101235959 GROUP BY log_actor
Also note that the query 24267 used to work in January 2020. I searched through the list, but haven't found any changes related to the logging table since January.
Cheers, Nux.
Wikimedia Cloud Services mailing listCloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud