[Labs-l] Empty Cat query
Huji Lee
huji.huji at gmail.com
Fri Sep 30 14:18:37 UTC 2016
T
hanks Bruce that was really smart!
On Thu, Sep 29, 2016 at 3:26 PM, Bruce Myers <bruce at brucemyers.com> wrote:
> The query is using the tl_namespace index (slow) instead of the tl_from
> index (fast). Since you can't force an index in a view, I tried removing
> the 'and tl_namespace = 10' qualifier and it ran in 40 seconds. It used the
> tl_from index. Since templatelinks usually point to the 10 namespace
> anyway, there shouldn't be to many, if any, false positives by leaving the
> qualifier out.
>
>
> On 09/28/2016 03:03 PM, Huji Lee wrote:
>
> I wanted to revive this thread: can someone please help me find a way to
> boost that query?
>
> On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <huji.huji at gmail.com> wrote:
>
>> Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that query
>> every few minutes until it timed out, and the number rows it was querying
>> increased drastically. It very quickly reached 2399220000, with the last
>> value I go before timeout being 3693530000. That is just way too many.
>>
>> T139090 could be the cause; it went into effect the day after last
>> successful query, and it does affect indexes used by the query.
>>
>> One thing that can obviously make my query faster is to force it to run
>> the joins in a particular order. I tried to enforce it through creating
>> temporary tables in memory, but it failed: https://quarry.wmflabs.org/que
>> ry/12719
>>
>> Another approach would be to create indexes. However, my permissions
>> don't allow me to see what indexes currently exist (is there a way around
>> that?) so I cannot decide what new indexes can be added.
>>
>> On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo <jcrespo at wikimedia.org>
>> wrote:
>>
>>> So I cannot give you very specific advise, but based on the EXPLAIN, it
>>> seems that you may be trying to read too many rows, up to
>>> 100K * 300K * 300K (10 000 M) and then sort them, which is probably
>>> going to be very slow.
>>>
>>> If they used to work in the past there could be 2 probable explanations:
>>> a lot of rows have been inserted recently on one or several of the tables
>>> (e.g. a new template or category with many members) or the indexes have
>>> changed. Note that redoing the queries is something that we have to do
>>> constantly in production because rows quantities change. I also can think
>>> of this change that happened recently on production, but cannot say for
>>> sure if it is related or could affect you negatively:
>>>
>>> https://phabricator.wikimedia.org/T139090
>>>
>>> Maybe that affects your query and it is as easy to fix as reordering
>>> your columns/changing slightly the filters or its order. If someone has a
>>> suggestion to make it faster, that requires a change to labs, remember that
>>> labs replicas are not static, and new indexes can be added if needed by the
>>> community, and I will gladly apply them myself.
>>>
>>> On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <huji.huji at gmail.com> wrote:
>>>
>>>> With two connections to the DB via Terminal, it is possible! Here is
>>>> what I got; it matches what you sent above too:
>>>>
>>>>
>>>>
>>>> MariaDB [fawiki_p]> show processlist;
>>>> +---------+--------+-------------------+----------+---------
>>>> +------+----------------------------+-----------------------
>>>> ------------------------------------------------------------
>>>> -------------------+----------+
>>>> | Id | User | Host | db | Command | Time |
>>>> State | Info
>>>>
>>>> | Progress |
>>>> +---------+--------+-------------------+----------+---------
>>>> +------+----------------------------+-----------------------
>>>> ------------------------------------------------------------
>>>> -------------------+----------+
>>>> | 3556437 | s51403 | 10.68.23.58:53391 | fawiki_p | Query | 19 |
>>>> Queried about 5450000 rows | select page_title, count(ll_lang) from page
>>>> join category on page_title = cat_title left join catego | 0.000 |
>>>> | 3556865 | s51403 | 10.68.23.58:54083 | fawiki_p | Query | 0 |
>>>> init | show processlist
>>>> |
>>>> 0.000 |
>>>> +---------+--------+-------------------+----------+---------
>>>> +------+----------------------------+-----------------------
>>>> ------------------------------------------------------------
>>>> -------------------+----------+
>>>> 2 rows in set (0.00 sec)
>>>>
>>>> MariaDB [fawiki_p]> show explain for 3556437;
>>>> +------+-------------+---------------+--------+-------------
>>>> ---------------------------+-----------------------------+--
>>>> -------+------------------------+--------+------------------
>>>> ----------------------------+
>>>> | id | select_type | table | type |
>>>> possible_keys | key |
>>>> key_len | ref | rows | Extra
>>>> |
>>>> +------+-------------+---------------+--------+-------------
>>>> ---------------------------+-----------------------------+--
>>>> -------+------------------------+--------+------------------
>>>> ----------------------------+
>>>> | 1 | SIMPLE | page | ref |
>>>> name_title,page_redirect_namespace_len | page_redirect_namespace_len |
>>>> 5 | const,const | 364177 | Using where; Using temporary;
>>>> Using filesort |
>>>> | 1 | SIMPLE | category | eq_ref |
>>>> cat_title | cat_title |
>>>> 257 | fawiki.page.page_title | 1 | Using
>>>> index |
>>>> | 1 | SIMPLE | categorylinks | ref |
>>>> cl_timestamp,cl_sortkey | cl_timestamp |
>>>> 257 | fawiki.page.page_title | 12 | Using where; Using
>>>> index |
>>>> | 1 | SIMPLE | templatelinks | ref |
>>>> tl_from,tl_namespace | tl_namespace |
>>>> 4 | const | 390622 | Using where; Using
>>>> index |
>>>> | 1 | SIMPLE | langlinks | ref |
>>>> ll_from | ll_from |
>>>> 4 | fawiki.page.page_id | 104911 | Using
>>>> index |
>>>> +------+-------------+---------------+--------+-------------
>>>> ---------------------------+-----------------------------+--
>>>> -------+------------------------+--------+------------------
>>>> ----------------------------+
>>>>
>>>>
>>>> On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee <huji.huji at gmail.com> wrote:
>>>>
>>>>> Oh, I misread your comment.
>>>>>
>>>>> I doubt SHOW EXPLAIN would work via Quarry, so I am trying to get it
>>>>> to work via terminal access to Labs. I wonder though if it works for a
>>>>> query that fails to complete.
>>>>>
>>>>> On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo <jcrespo at wikimedia.org>
>>>>> wrote:
>>>>>
>>>>>> Huji, note I suggested SHOW EXPLAIN, not EXPLAIN; it is a different
>>>>>> command, check the link I provided and report if that doesn't work for you.
>>>>>>
>>>>>> On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee <huji.huji at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> See https://quarry.wmflabs.org/query/12718 and the error message
>>>>>>> returned.
>>>>>>>
>>>>>>> EXPLAIN only works if you have access to the underlying tables. I
>>>>>>> only have access to the VIEWs built on top of actual wiki tables (for good
>>>>>>> reason, same applies to most of us) so I get an error when I try EXPLAIN.
>>>>>>>
>>>>>>> On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo <
>>>>>>> jcrespo at wikimedia.org> wrote:
>>>>>>>
>>>>>>>> Explain (you should be able to run SHOW EXPLAIN on your own
>>>>>>>> queries: http://s.petrunia.net/blog/?p=89):
>>>>>>>>
>>>>>>>> EXPLAIN select page_title, count(ll_lang) from page join category
>>>>>>>> on page_title = cat_title left join categorylinks on page_title = cl_to
>>>>>>>> left join templatelinks on tl_from = page_id and tl_namespace = 10 and
>>>>>>>> tl_title in ( 'رده_خالی' , 'رده_بهتر', 'رده_ابهامزدایی',
>>>>>>>> 'رده_ردیابیکردن' ) left join langlinks on page_id = ll_from where
>>>>>>>> page_namespace = 14 and page_is_redirect = 0 and cl_to is null and tl_title
>>>>>>>> is null group by page_title order by 2, 1 limit 5000\G
>>>>>>>> *************************** 1. row ***************************
>>>>>>>> id: 1
>>>>>>>> select_type: SIMPLE
>>>>>>>> table: page
>>>>>>>> type: ref
>>>>>>>> possible_keys: name_title,page_redirect_namespace_len
>>>>>>>> key: page_redirect_namespace_len
>>>>>>>> key_len: 5
>>>>>>>> ref: const,const
>>>>>>>> rows: 364273
>>>>>>>> Extra: Using where; Using temporary; Using filesort
>>>>>>>> *************************** 2. row ***************************
>>>>>>>> id: 1
>>>>>>>> select_type: SIMPLE
>>>>>>>> table: category
>>>>>>>> type: eq_ref
>>>>>>>> possible_keys: cat_title
>>>>>>>> key: cat_title
>>>>>>>> key_len: 257
>>>>>>>> ref: fawiki.page.page_title
>>>>>>>> rows: 1
>>>>>>>> Extra: Using index
>>>>>>>> *************************** 3. row ***************************
>>>>>>>> id: 1
>>>>>>>> select_type: SIMPLE
>>>>>>>> table: categorylinks
>>>>>>>> type: ref
>>>>>>>> possible_keys: cl_timestamp,cl_sortkey
>>>>>>>> key: cl_timestamp
>>>>>>>> key_len: 257
>>>>>>>> ref: fawiki.page.page_title
>>>>>>>> rows: 12
>>>>>>>> Extra: Using where; Using index
>>>>>>>> *************************** 4. row ***************************
>>>>>>>> id: 1
>>>>>>>> select_type: SIMPLE
>>>>>>>> table: templatelinks
>>>>>>>> type: ref
>>>>>>>> possible_keys: tl_from,tl_namespace
>>>>>>>> key: tl_namespace
>>>>>>>> key_len: 4
>>>>>>>> ref: const
>>>>>>>> rows: 390610
>>>>>>>> Extra: Using where; Using index
>>>>>>>> *************************** 5. row ***************************
>>>>>>>> id: 1
>>>>>>>> select_type: SIMPLE
>>>>>>>> table: langlinks
>>>>>>>> type: ref
>>>>>>>> possible_keys: ll_from
>>>>>>>> key: ll_from
>>>>>>>> key_len: 4
>>>>>>>> ref: fawiki.page.page_id
>>>>>>>> rows: 104910
>>>>>>>> Extra: Using index
>>>>>>>>
>>>>>>>> On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee <huji.huji at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have a query [1] which I run weekly to identify empty
>>>>>>>>> categories; those that don't have interwiki links and stay empty for a
>>>>>>>>> while are then deleted by a sysop.
>>>>>>>>>
>>>>>>>>> The query ran just fine every week until Sep 2nd. [2] Since then,
>>>>>>>>> the query times out and the page doesn't get updated.
>>>>>>>>>
>>>>>>>>> The query is no that complex (a SELECT with five JOINs) and used
>>>>>>>>> to finish in about 3-5 minutes. I don't have EXPLAIN access on Labs so I
>>>>>>>>> cannot tell what is slowing it down. Can someone kindly take a look and
>>>>>>>>> advise why this suddenly stopped working?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Huji
>>>>>>>>>
>>>>>>>>> [1] https://quarry.wmflabs.org/query/3760
>>>>>>>>> [2] https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A
>>>>>>>>> 9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%
>>>>>>>>> D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B
>>>>>>>>> 1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%
>>>>>>>>> DB%8C&action=history&uselang=en
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Labs-l mailing list
>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jaime Crespo
>>>>>>>> <http://wikimedia.org>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Labs-l mailing list
>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Labs-l mailing list
>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jaime Crespo
>>>>>> <http://wikimedia.org>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Labs-l mailing list
>>>>>> Labs-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>
>>>
>>> --
>>> Jaime Crespo
>>> <http://wikimedia.org>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>>
>
>
> _______________________________________________
> Labs-l mailing listLabs-l at lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160930/1864cb99/attachment-0001.html>
More information about the Labs-l
mailing list