<div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">I wanted to revive this thread: can someone please help me find a way to boost that query?<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <span dir="ltr"><<a href="mailto:huji.huji@gmail.com" target="_blank">huji.huji@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that query every few minutes until it timed out, and the number rows it was querying increased drastically. It very quickly reached <a href="tel:2399220000" value="+12399220000" target="_blank">2399220000</a>, with the last value I go before timeout being 3693530000. That is just way too many.<br><br>T139090 could be the cause; it went into effect the day after last successful query, and it does affect indexes used by the query.<br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">One thing that can obviously make my query faster is to force it to run the joins in a particular order. I tried to enforce it through creating temporary tables in memory, but it failed: <a href="https://quarry.wmflabs.org/query/12719" target="_blank">https://quarry.wmflabs.org/<wbr>query/12719</a><br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Another approach would be to create indexes. However, my permissions don't allow me to see what indexes currently exist (is there a way around that?) so I cannot decide what new indexes can be added.<br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo <span dir="ltr"><<a href="mailto:jcrespo@wikimedia.org" target="_blank">jcrespo@wikimedia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>So I cannot give you very specific advise, but based on the EXPLAIN, it seems that you may be trying to read too many rows, up to<br></div>100K * 300K * 300K (10 000 M) and then sort them, which is probably going to be very slow.<br><br></div>If they used to work in the past there could be 2 probable explanations: a lot of rows have been inserted recently on one or several of the tables (e.g. a new template or category with many members) or the indexes have changed. Note that redoing the queries is something that we have to do constantly in production because rows quantities change. I also can think of this change that happened recently on production, but cannot say for sure if it is related or could affect you negatively:<br><br><a href="https://phabricator.wikimedia.org/T139090" target="_blank">https://phabricator.wikimedia.<wbr>org/T139090</a><br><br></div>Maybe that affects your query and it is as easy to fix as reordering your columns/changing slightly the filters or its order. If someone has a suggestion to make it faster, that requires a change to labs, remember that labs replicas are not static, and new indexes can be added if needed by the community, and I will gladly apply them myself.<br></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <span dir="ltr"><<a href="mailto:huji.huji@gmail.com" target="_blank">huji.huji@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">With two connections to the DB via Terminal, it is possible! Here is what I got; it matches what you sent above too:<br><br><br><br><font size="1"><span style="font-family:monospace,monospace">MariaDB [fawiki_p]> show processlist;<br>+---------+--------+----------<wbr>---------+----------+---------<wbr>+------+----------------------<wbr>------+-----------------------<wbr>------------------------------<wbr>------------------------------<wbr>-------------------+----------<wbr>+<br>| Id      | User   | Host              | db       | Command | Time | State                      | Info                          <wbr>                              <wbr>                              <wbr>           | Progress |<br>+---------+--------+----------<wbr>---------+----------+---------<wbr>+------+----------------------<wbr>------+-----------------------<wbr>------------------------------<wbr>------------------------------<wbr>-------------------+----------<wbr>+<br>| 3556437 | s51403 | <a href="http://10.68.23.58:53391" target="_blank">10.68.23.58:53391</a> | fawiki_p | Query   |   19 | Queried about 5450000 rows | select page_title, count(ll_lang) from page join category on page_title = cat_title left join catego |    0.000 |<br>| 3556865 | s51403 | <a href="http://10.68.23.58:54083" target="_blank">10.68.23.58:54083</a> | fawiki_p | Query   |    0 | init                       | show processlist                   <wbr>                              <wbr>                              <wbr>      |    0.000 |<br>+---------+--------+----------<wbr>---------+----------+---------<wbr>+------+----------------------<wbr>------+-----------------------<wbr>------------------------------<wbr>------------------------------<wbr>-------------------+----------<wbr>+<br>2 rows in set (0.00 sec)<br><br>MariaDB [fawiki_p]> show explain for 3556437;<br>+------+-------------+--------<wbr>-------+--------+-------------<wbr>---------------------------+--<wbr>---------------------------+--<wbr>-------+----------------------<wbr>--+--------+------------------<wbr>----------------------------+<br>| id   | select_type | table         | type   | possible_keys                 <wbr>         | key                         | key_len | ref                    | rows   | Extra                         <wbr>               |<br>+------+-------------+--------<wbr>-------+--------+-------------<wbr>---------------------------+--<wbr>---------------------------+--<wbr>-------+----------------------<wbr>--+--------+------------------<wbr>----------------------------+<br>|    1 | SIMPLE      | page          | ref    | name_title,page_redirect_names<wbr>pace_len | page_redirect_namespace_len | 5       | const,const            | 364177 | Using where; Using temporary; Using filesort |<br>|    1 | SIMPLE      | category      | eq_ref | cat_title                     <wbr>         | cat_title                   | 257     | fawiki.page.page_title |      1 | Using index                         <wbr>         |<br>|    1 | SIMPLE      | categorylinks | ref    | cl_timestamp,cl_sortkey       <wbr>         | cl_timestamp                | 257     | fawiki.page.page_title |     12 | Using where; Using index                     |<br>|    1 | SIMPLE      | templatelinks | ref    | tl_from,tl_namespace          <wbr>         | tl_namespace                | 4       | const                  | 390622 | Using where; Using index                     |<br>|    1 | SIMPLE      | langlinks     | ref    | ll_from                       <wbr>         | ll_from                     | 4       | fawiki.page.page_id    | 104911 | Using index                         <wbr>         |<br>+------+-------------+--------<wbr>-------+--------+-------------<wbr>---------------------------+--<wbr>---------------------------+--<wbr>-------+----------------------<wbr>--+--------+------------------<wbr>----------------------------+<br></span></font><br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee <span dir="ltr"><<a href="mailto:huji.huji@gmail.com" target="_blank">huji.huji@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">Oh, I misread your comment.<br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">I doubt SHOW EXPLAIN would work via Quarry, so I am trying to get it to work via terminal access to Labs. I wonder though if it works for a query that fails to complete.<br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo <span dir="ltr"><<a href="mailto:jcrespo@wikimedia.org" target="_blank">jcrespo@wikimedia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Huji, note I suggested SHOW EXPLAIN, not EXPLAIN; it is a different command, check the link I provided and report if that doesn't work for you.<br></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee <span dir="ltr"><<a href="mailto:huji.huji@gmail.com" target="_blank">huji.huji@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div style="font-family:tahoma,sans-serif" class="gmail_default">See <a href="https://quarry.wmflabs.org/query/12718" target="_blank">https://quarry.wmflabs.org/que<wbr>ry/12718</a> and the error message returned.<br><br></div><div style="font-family:tahoma,sans-serif" class="gmail_default">EXPLAIN only works if you have access to the underlying tables. I only have access to the VIEWs built on top of actual wiki tables (for good reason, same applies to most of us) so I get an error when I try EXPLAIN.<br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo <span dir="ltr"><<a href="mailto:jcrespo@wikimedia.org" target="_blank">jcrespo@wikimedia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Explain (you should be able to run SHOW EXPLAIN on your own queries: <a href="http://s.petrunia.net/blog/?p=89" target="_blank">http://s.petrunia.net/blog/?p=<wbr>89</a>):<br><br>EXPLAIN select page_title, count(ll_lang) from page join category on page_title = cat_title left join categorylinks on page_title = cl_to left join templatelinks on tl_from = page_id and tl_namespace = 10 and tl_title in (   'رده_خالی' , 'رده_بهتر', 'رده_ابهام‌زدایی', 'رده_ردیابی‌کردن' ) left join langlinks on page_id = ll_from where page_namespace = 14 and page_is_redirect = 0 and cl_to is null and tl_title is null group by page_title order by 2, 1 limit 5000\G<br>*************************** 1. row ***************************<br>           id: 1<br>  select_type: SIMPLE<br>        table: page<br>         type: ref<br>possible_keys: name_title,page_redirect_names<wbr>pace_len<br>          key: page_redirect_namespace_len<br>      key_len: 5<br>          ref: const,const<br>         rows: 364273<br>        Extra: Using where; Using temporary; Using filesort<br>*************************** 2. row ***************************<br>           id: 1<br>  select_type: SIMPLE<br>        table: category<br>         type: eq_ref<br>possible_keys: cat_title<br>          key: cat_title<br>      key_len: 257<br>          ref: fawiki.page.page_title<br>         rows: 1<br>        Extra: Using index<br>*************************** 3. row ***************************<br>           id: 1<br>  select_type: SIMPLE<br>        table: categorylinks<br>         type: ref<br>possible_keys: cl_timestamp,cl_sortkey<br>          key: cl_timestamp<br>      key_len: 257<br>          ref: fawiki.page.page_title<br>         rows: 12<br>        Extra: Using where; Using index<br>*************************** 4. row ***************************<br>           id: 1<br>  select_type: SIMPLE<br>        table: templatelinks<br>         type: ref<br>possible_keys: tl_from,tl_namespace<br>          key: tl_namespace<br>      key_len: 4<br>          ref: const<br>         rows: 390610<br>        Extra: Using where; Using index<br>*************************** 5. row ***************************<br>           id: 1<br>  select_type: SIMPLE<br>        table: langlinks<br>         type: ref<br>possible_keys: ll_from<br>          key: ll_from<br>      key_len: 4<br>          ref: fawiki.page.page_id<br>         rows: 104910<br>        Extra: Using index<br></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee <span dir="ltr"><<a href="mailto:huji.huji@gmail.com" target="_blank">huji.huji@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">Hi all,<br><br>I have a query [1] which I run weekly to identify empty categories; those that don't have interwiki links and stay empty for a while are then deleted by a sysop.<br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">The query ran just fine every week until Sep 2nd. [2] Since then, the query times out and the page doesn't get updated.<br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">The query is no that complex (a SELECT with five JOINs) and used to finish in about 3-5 minutes. I don't have EXPLAIN access on Labs so I cannot tell what is slowing it down. Can someone kindly take a look and advise why this suddenly stopped working?<br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Thanks,<br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Huji<br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br>[1] <a href="https://quarry.wmflabs.org/query/3760" target="_blank">https://quarry.wmflabs.org/que<wbr>ry/3760</a><br>[2] <a href="https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en" target="_blank">https://fa.wikipedia.org/w/ind<wbr>ex.php?title=%D9%88%DB%8C%DA%A<wbr>9%DB%8C%E2%80%8C%D9%BE%D8%AF%D<wbr>B%8C%D8%A7:%DA%AF%D8%B2%D8%A7%<wbr>D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA<wbr>%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B<wbr>1%D8%AF%D9%87%E2%80%8C%D9%87%D<wbr>8%A7%DB%8C_%D8%AE%D8%A7%D9%84%<wbr>DB%8C&action=history&uselang=e<wbr>n</a><br></div></div>
<br></div></div>______________________________<wbr>_________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org" target="_blank">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" target="_blank">https://lists.wikimedia.org/ma<wbr>ilman/listinfo/labs-l</a><br>
<br></blockquote></div><span><font color="#888888"><br><br clear="all"><br>-- <br><div data-smartmail="gmail_signature"><div dir="ltr"><div>Jaime Crespo<br></div><<a href="http://wikimedia.org" target="_blank">http://wikimedia.org</a>><br></div></div>
</font></span></div>
<br>______________________________<wbr>_________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org" target="_blank">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" target="_blank">https://lists.wikimedia.org/ma<wbr>ilman/listinfo/labs-l</a><br>
<br></blockquote></div><br></div>
</div></div><br>______________________________<wbr>_________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org" target="_blank">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" target="_blank">https://lists.wikimedia.org/ma<wbr>ilman/listinfo/labs-l</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><div data-smartmail="gmail_signature"><div dir="ltr"><div>Jaime Crespo<br></div><<a href="http://wikimedia.org" target="_blank">http://wikimedia.org</a>><br></div></div>
</div>
</div></div><br>______________________________<wbr>_________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org" target="_blank">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" target="_blank">https://lists.wikimedia.org/ma<wbr>ilman/listinfo/labs-l</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div><br>______________________________<wbr>_________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org" target="_blank">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" target="_blank">https://lists.wikimedia.org/ma<wbr>ilman/listinfo/labs-l</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><div data-smartmail="gmail_signature"><div dir="ltr"><div>Jaime Crespo<br></div><<a href="http://wikimedia.org" target="_blank">http://wikimedia.org</a>><br></div></div>
</div>
</div></div><br>______________________________<wbr>_________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org" target="_blank">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" target="_blank">https://lists.wikimedia.org/ma<wbr>ilman/listinfo/labs-l</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br></div>