Alright, now that I've unearthed the page I was looking for, a fuller answer.
The wiki replicas have to do a lot of database magic to remove private/deleted data, which means that the normal tables can be slow.

For your logging query, logging_logindex makes things faster because log_action is nulled for deleted log entries in logging and logging_userindex. In logging_logindex, rows with a deleted action are removed entirely: you lose the data for deleted log entries at the benefit of indexes on log_action. That's not a concern with this query because you're filtering on log action anyway.

The actor view on the replicas has to do even more database magic and perform subqueries on 8 other tables, so it's almost always going to slow things down. If you use actor_logging instead of actor, you skip 7 of those subqueries.

More information:
https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Alternative_Views
https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas#The_actor_table_seems_really_slow--so_does_comment
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/profile/templates/labs/db/views/maintain-views.yaml (configuration for the views on the replicas)

As for why these queries were faster a few months ago, that's probably related to database server issues (https://phabricator.wikimedia.org/T246970 and https://phabricator.wikimedia.org/T247978).

Using the optimized views brings the all-time query down to about 15 minutes (https://quarry.wmflabs.org/query/43984) and the 2019 query down to about 106 seconds (https://quarry.wmflabs.org/query/43985).

AntiCompositeNumber

On Wed, Apr 15, 2020 at 18:13 Maciej Jaros <egil@wp.pl> wrote:
Also note that I have a similar query that kind of works but takes much longer to complete then it used to.
https://quarry.wmflabs.org/query/41680

Executed in 46.22 seconds as of Wed Jan 29 2020.
Resultset (1480 rows)

Executed in 188.32 seconds as of Wed Apr 15 2020.
Resultset (1480 rows)

So thats about 4 times longer then before on the same data. Well actor table is probably longer, but I assume this will use indexes: `LEFT JOIN actor a ON actor_id = log_actor`.

Cheers,
Nux.

Maciej Jaros (2020-04-15 23:56):
Not sure if this is an appropriate place to discuss this but I'm not able to run this query:
https://quarry.wmflabs.org/query/24267

Even a minor part of this query like below needs a significant time to complete.
   SELECT count(*) as review_count, log_actor
    FROM logging
    WHERE log_type = 'review' AND log_action = 'approve'
    GROUP BY log_actor

I tried with logging_userindex but that don't seem to help. Any chances on adding extra indexes to make such queries work? I think querying log_type and log_actor should be quite common for various user stats.

Even this takes almost 200 seconds to complete:
    SELECT count(*) as review_count, log_actor
    FROM logging_userindex
    WHERE log_type = 'review' AND log_action = 'approve'
    AND log_timestamp >= 20190101000000
    AND log_timestamp <= 20190101235959
    GROUP BY log_actor

Also note that the query
24267 used to work in January 2020. I searched through the list, but haven't found any changes related to the logging table since January.

Cheers,
Nux.

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud