[Labs-l] Empty Cat query

Huji Lee huji.huji at gmail.com
Wed Sep 28 19:03:21 UTC 2016


I wanted to revive this thread: can someone please help me find a way to
boost that query?

On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <huji.huji at gmail.com> wrote:

> Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that query
> every few minutes until it timed out, and the number rows it was querying
> increased drastically. It very quickly reached 2399220000, with the last
> value I go before timeout being 3693530000. That is just way too many.
>
> T139090 could be the cause; it went into effect the day after last
> successful query, and it does affect indexes used by the query.
>
> One thing that can obviously make my query faster is to force it to run
> the joins in a particular order. I tried to enforce it through creating
> temporary tables in memory, but it failed: https://quarry.wmflabs.org/
> query/12719
>
> Another approach would be to create indexes. However, my permissions don't
> allow me to see what indexes currently exist (is there a way around that?)
> so I cannot decide what new indexes can be added.
>
> On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo <jcrespo at wikimedia.org>
> wrote:
>
>> So I cannot give you very specific advise, but based on the EXPLAIN, it
>> seems that you may be trying to read too many rows, up to
>> 100K * 300K * 300K (10 000 M) and then sort them, which is probably going
>> to be very slow.
>>
>> If they used to work in the past there could be 2 probable explanations:
>> a lot of rows have been inserted recently on one or several of the tables
>> (e.g. a new template or category with many members) or the indexes have
>> changed. Note that redoing the queries is something that we have to do
>> constantly in production because rows quantities change. I also can think
>> of this change that happened recently on production, but cannot say for
>> sure if it is related or could affect you negatively:
>>
>> https://phabricator.wikimedia.org/T139090
>>
>> Maybe that affects your query and it is as easy to fix as reordering your
>> columns/changing slightly the filters or its order. If someone has a
>> suggestion to make it faster, that requires a change to labs, remember that
>> labs replicas are not static, and new indexes can be added if needed by the
>> community, and I will gladly apply them myself.
>>
>> On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <huji.huji at gmail.com> wrote:
>>
>>> With two connections to the DB via Terminal, it is possible! Here is
>>> what I got; it matches what you sent above too:
>>>
>>>
>>>
>>> MariaDB [fawiki_p]> show processlist;
>>> +---------+--------+-------------------+----------+---------
>>> +------+----------------------------+-----------------------
>>> ------------------------------------------------------------
>>> -------------------+----------+
>>> | Id      | User   | Host              | db       | Command | Time |
>>> State                      | Info
>>>
>>> | Progress |
>>> +---------+--------+-------------------+----------+---------
>>> +------+----------------------------+-----------------------
>>> ------------------------------------------------------------
>>> -------------------+----------+
>>> | 3556437 | s51403 | 10.68.23.58:53391 | fawiki_p | Query   |   19 |
>>> Queried about 5450000 rows | select page_title, count(ll_lang) from page
>>> join category on page_title = cat_title left join catego |    0.000 |
>>> | 3556865 | s51403 | 10.68.23.58:54083 | fawiki_p | Query   |    0 |
>>> init                       | show processlist
>>>                                                                   |
>>> 0.000 |
>>> +---------+--------+-------------------+----------+---------
>>> +------+----------------------------+-----------------------
>>> ------------------------------------------------------------
>>> -------------------+----------+
>>> 2 rows in set (0.00 sec)
>>>
>>> MariaDB [fawiki_p]> show explain for 3556437;
>>> +------+-------------+---------------+--------+-------------
>>> ---------------------------+-----------------------------+--
>>> -------+------------------------+--------+------------------
>>> ----------------------------+
>>> | id   | select_type | table         | type   |
>>> possible_keys                          | key                         |
>>> key_len | ref                    | rows   | Extra
>>> |
>>> +------+-------------+---------------+--------+-------------
>>> ---------------------------+-----------------------------+--
>>> -------+------------------------+--------+------------------
>>> ----------------------------+
>>> |    1 | SIMPLE      | page          | ref    |
>>> name_title,page_redirect_namespace_len | page_redirect_namespace_len |
>>> 5       | const,const            | 364177 | Using where; Using temporary;
>>> Using filesort |
>>> |    1 | SIMPLE      | category      | eq_ref |
>>> cat_title                              | cat_title                   |
>>> 257     | fawiki.page.page_title |      1 | Using
>>> index                                  |
>>> |    1 | SIMPLE      | categorylinks | ref    |
>>> cl_timestamp,cl_sortkey                | cl_timestamp                |
>>> 257     | fawiki.page.page_title |     12 | Using where; Using
>>> index                     |
>>> |    1 | SIMPLE      | templatelinks | ref    |
>>> tl_from,tl_namespace                   | tl_namespace                |
>>> 4       | const                  | 390622 | Using where; Using
>>> index                     |
>>> |    1 | SIMPLE      | langlinks     | ref    |
>>> ll_from                                | ll_from                     |
>>> 4       | fawiki.page.page_id    | 104911 | Using
>>> index                                  |
>>> +------+-------------+---------------+--------+-------------
>>> ---------------------------+-----------------------------+--
>>> -------+------------------------+--------+------------------
>>> ----------------------------+
>>>
>>>
>>> On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee <huji.huji at gmail.com> wrote:
>>>
>>>> Oh, I misread your comment.
>>>>
>>>> I doubt SHOW EXPLAIN would work via Quarry, so I am trying to get it to
>>>> work via terminal access to Labs. I wonder though if it works for a query
>>>> that fails to complete.
>>>>
>>>> On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo <jcrespo at wikimedia.org>
>>>> wrote:
>>>>
>>>>> Huji, note I suggested SHOW EXPLAIN, not EXPLAIN; it is a different
>>>>> command, check the link I provided and report if that doesn't work for you.
>>>>>
>>>>> On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee <huji.huji at gmail.com> wrote:
>>>>>
>>>>>> See https://quarry.wmflabs.org/query/12718 and the error message
>>>>>> returned.
>>>>>>
>>>>>> EXPLAIN only works if you have access to the underlying tables. I
>>>>>> only have access to the VIEWs built on top of actual wiki tables (for good
>>>>>> reason, same applies to most of us) so I get an error when I try EXPLAIN.
>>>>>>
>>>>>> On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo <jcrespo at wikimedia.org
>>>>>> > wrote:
>>>>>>
>>>>>>> Explain (you should be able to run SHOW EXPLAIN on your own queries:
>>>>>>> http://s.petrunia.net/blog/?p=89):
>>>>>>>
>>>>>>> EXPLAIN select page_title, count(ll_lang) from page join category on
>>>>>>> page_title = cat_title left join categorylinks on page_title = cl_to left
>>>>>>> join templatelinks on tl_from = page_id and tl_namespace = 10 and tl_title
>>>>>>> in (   'رده_خالی' , 'رده_بهتر', 'رده_ابهام‌زدایی', 'رده_ردیابی‌کردن' ) left
>>>>>>> join langlinks on page_id = ll_from where page_namespace = 14 and
>>>>>>> page_is_redirect = 0 and cl_to is null and tl_title is null group by
>>>>>>> page_title order by 2, 1 limit 5000\G
>>>>>>> *************************** 1. row ***************************
>>>>>>>            id: 1
>>>>>>>   select_type: SIMPLE
>>>>>>>         table: page
>>>>>>>          type: ref
>>>>>>> possible_keys: name_title,page_redirect_namespace_len
>>>>>>>           key: page_redirect_namespace_len
>>>>>>>       key_len: 5
>>>>>>>           ref: const,const
>>>>>>>          rows: 364273
>>>>>>>         Extra: Using where; Using temporary; Using filesort
>>>>>>> *************************** 2. row ***************************
>>>>>>>            id: 1
>>>>>>>   select_type: SIMPLE
>>>>>>>         table: category
>>>>>>>          type: eq_ref
>>>>>>> possible_keys: cat_title
>>>>>>>           key: cat_title
>>>>>>>       key_len: 257
>>>>>>>           ref: fawiki.page.page_title
>>>>>>>          rows: 1
>>>>>>>         Extra: Using index
>>>>>>> *************************** 3. row ***************************
>>>>>>>            id: 1
>>>>>>>   select_type: SIMPLE
>>>>>>>         table: categorylinks
>>>>>>>          type: ref
>>>>>>> possible_keys: cl_timestamp,cl_sortkey
>>>>>>>           key: cl_timestamp
>>>>>>>       key_len: 257
>>>>>>>           ref: fawiki.page.page_title
>>>>>>>          rows: 12
>>>>>>>         Extra: Using where; Using index
>>>>>>> *************************** 4. row ***************************
>>>>>>>            id: 1
>>>>>>>   select_type: SIMPLE
>>>>>>>         table: templatelinks
>>>>>>>          type: ref
>>>>>>> possible_keys: tl_from,tl_namespace
>>>>>>>           key: tl_namespace
>>>>>>>       key_len: 4
>>>>>>>           ref: const
>>>>>>>          rows: 390610
>>>>>>>         Extra: Using where; Using index
>>>>>>> *************************** 5. row ***************************
>>>>>>>            id: 1
>>>>>>>   select_type: SIMPLE
>>>>>>>         table: langlinks
>>>>>>>          type: ref
>>>>>>> possible_keys: ll_from
>>>>>>>           key: ll_from
>>>>>>>       key_len: 4
>>>>>>>           ref: fawiki.page.page_id
>>>>>>>          rows: 104910
>>>>>>>         Extra: Using index
>>>>>>>
>>>>>>> On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee <huji.huji at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a query [1] which I run weekly to identify empty categories;
>>>>>>>> those that don't have interwiki links and stay empty for a while are then
>>>>>>>> deleted by a sysop.
>>>>>>>>
>>>>>>>> The query ran just fine every week until Sep 2nd. [2] Since then,
>>>>>>>> the query times out and the page doesn't get updated.
>>>>>>>>
>>>>>>>> The query is no that complex (a SELECT with five JOINs) and used to
>>>>>>>> finish in about 3-5 minutes. I don't have EXPLAIN access on Labs so I
>>>>>>>> cannot tell what is slowing it down. Can someone kindly take a look and
>>>>>>>> advise why this suddenly stopped working?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Huji
>>>>>>>>
>>>>>>>> [1] https://quarry.wmflabs.org/query/3760
>>>>>>>> [2] https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A
>>>>>>>> 9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%
>>>>>>>> D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B
>>>>>>>> 1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%
>>>>>>>> DB%8C&action=history&uselang=en
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Labs-l mailing list
>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jaime Crespo
>>>>>>> <http://wikimedia.org>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Labs-l mailing list
>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Labs-l mailing list
>>>>>> Labs-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jaime Crespo
>>>>> <http://wikimedia.org>
>>>>>
>>>>> _______________________________________________
>>>>> Labs-l mailing list
>>>>> Labs-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>>
>>
>> --
>> Jaime Crespo
>> <http://wikimedia.org>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160928/d05c70b2/attachment-0001.html>


More information about the Labs-l mailing list