All great answers; thank you! Let's wait and see what comes out of T224850.

On Tue, Jun 4, 2019 at 8:21 PM Bryan Davis <bd808@wikimedia.org> wrote:
On Tue, Jun 4, 2019 at 4:54 PM Huji Lee <huji.huji@gmail.com> wrote:
>
> Hi all,
>
> I have a query that used to do a GROUP BY on rev_user and now I have migrated it to do so using rev_actor instead. The latest version looks like this [0] and its performance is not good [1] at all. In fact, I left it running for fawiki for several hours and it did not generate any results (it used to generate results in about 30-40 minutes). Also, the execution plan that is shown in the second link involves some tables like filearchive, image and oldimage that are not part of the query.
>
> Is this a side effect of the revision_actor_temp table?

The EXPLAIN output including filearchive, image, oldimage, and other
apparently unrelated fields is caused by the view for the actor table.
This is basically the same problem as I described in
<https://lists.wikimedia.org/pipermail/cloud/2019-June/000683.html>
for the view of the comment table. The flag for suppression lives in
the related entity table, so the actor view runs correlated
sub-queries for each entity table that could indicate that a specific
row from the actor table has been suppressed.

I created <https://phabricator.wikimedia.org/T224850> to look for a
solution for this same class of issue in the comment table. If we find
a fix there it seems likely that we could do the same thing for the
actor table, whatever that may end up being.

> Also, what is that "key0" about, which is not associated with any tables?

This is a key generated by the database on a temporary table that is
being used to answer your query. It looks like there is a little
display bug in the sql-optimizer tool. The table name that should be
shown is "<derived2>" which you can see in the HTML source of the
page. See <https://mariadb.com/kb/en/library/derived-table-with-key-optimization/>
for some details on this type of internal optimization.

> Finally, any advice on how to make it more efficient is highly appreciated.

The "Using filesort" notes in the plan are caused by the GROUP BYs. If
you can figure out how to get rid of the GROUP BY clauses things
should be faster, but to do that I think you might need to collect
information using multiple queries and some scripting language to
collect and correlate the data outside of the database.

I would generally expect any query that tries to compute statistics
for all editors who have ever been active on a wiki will be slow. You
might try finding a good way to break the query into smaller batches.

> [0]: https://pastebin.com/jWTzsiJY
> [1]: https://tools.wmflabs.org/sql-optimizer?use=fawiki_p&sql=SELECT%0D%0A++actor_name%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+0+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+article%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+10+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+tpl%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+12+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+helppage%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+14+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+cat%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+100+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+portal%2C%0D%0A++COUNT%28rev_first%29+tot%0D%0AFROM+revision+r%0D%0AJOIN+%28%0D%0A++select%0D%0A++++MIN%28rev_id%29+rev_first%2C%0D%0A++++rev_page%0D%0A++FROM+revision%0D%0A++GROUP+BY+rev_page%0D%0A%29+f%0D%0A++ON+r.rev_id+%3D+f.rev_first%0D%0AJOIN+page%0D%0A++ON+page_id+%3D+r.rev_page%0D%0AJOIN+actor%0D%0A++ON+rev_actor+%3D+actor_id%0D%0ALEFT+JOIN+user_groups%0D%0A++ON+actor_user+%3D+ug_user%0D%0A++AND+ug_group+%3D+%22bot%22%0D%0AWHERE%0D%0A++actor_user+%3C%3E+0%0D%0A++AND+ug_group+IS+NULL%0D%0A++AND+page_namespace+IN+%28%0D%0A++++0%2C%0D%0A++++10%2C%0D%0A++++12%2C%0D%0A++++14%2C%0D%0A++++100%0D%0A++%29%0D%0AGROUP+BY+rev_actor%0D%0AORDER+BY+tot+desc%0D%0ALIMIT+300


Bryan
--
Bryan Davis              Wikimedia Foundation    <bd808@wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement    Boise, ID USA
irc: bd808                                        v:415.839.6885 x6855

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud