Alright, now that I've unearthed the page I was looking for, a fuller
answer.
The wiki replicas have to do a lot of database magic to remove
private/deleted data, which means that the normal tables can be slow.
For your logging query, logging_logindex makes things faster because
log_action is nulled for deleted log entries in logging and
logging_userindex. In logging_logindex, rows with a deleted action are
removed entirely: you lose the data for deleted log entries at the benefit
of indexes on log_action. That's not a concern with this query because
you're filtering on log action anyway.
The actor view on the replicas has to do even more database magic and
perform subqueries on 8 other tables, so it's almost always going to slow
things down. If you use actor_logging instead of actor, you skip 7 of those
subqueries.
More information:
https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Alternative_Views
https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_…
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Tables_for_revi…
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/producti…
(configuration for the views on the replicas)
As for why these queries were faster a few months ago, that's probably
related to database server issues (
https://phabricator.wikimedia.org/T246970
and
https://phabricator.wikimedia.org/T247978).
Using the optimized views brings the all-time query down to about 15
minutes (
https://quarry.wmflabs.org/query/43984) and the 2019 query down to
about 106 seconds (
https://quarry.wmflabs.org/query/43985).
AntiCompositeNumber
On Wed, Apr 15, 2020 at 18:13 Maciej Jaros <egil(a)wp.pl> wrote:
Also note that I have a similar query that kind of
works but takes much
longer to complete then it used to.
https://quarry.wmflabs.org/query/41680
Executed in 46.22 seconds as of Wed Jan 29 2020.
Resultset (1480 rows)
Executed in 188.32 seconds as of Wed Apr 15 2020.
Resultset (1480 rows)
So thats about 4 times longer then before on the same data. Well actor
table is probably longer, but I assume this will use indexes: `LEFT JOIN
actor a ON actor_id = log_actor`.
Cheers,
Nux.
Maciej Jaros (2020-04-15 23:56):
Not sure if this is an appropriate place to discuss this but I'm not able
to run this query:
https://quarry.wmflabs.org/query/24267
Even a minor part of this query like below needs a significant time to
complete.
SELECT count(*) as review_count, log_actor
FROM logging
WHERE log_type = 'review' AND log_action = 'approve'
GROUP BY log_actor
I tried with logging_userindex but that don't seem to help. Any chances on
adding extra indexes to make such queries work? I think querying log_type
and log_actor should be quite common for various user stats.
Even this takes almost 200 seconds to complete:
SELECT count(*) as review_count, log_actor
FROM logging_userindex
WHERE log_type = 'review' AND log_action = 'approve'
AND log_timestamp >= 20190101000000
AND log_timestamp <= 20190101235959
GROUP BY log_actor
Also note that the query 24267 used to work in January 2020. I searched
through the list, but haven't found any changes related to the logging
table since January.
Cheers,
Nux.
_______________________________________________
Wikimedia Cloud Services mailing listCloud(a)lists.wikimedia.org (formerly
labs-l@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud