[Labs-l] Query optimization

Jaime Crespo jcrespo at wikimedia.org
Wed Mar 9 08:17:13 UTC 2016


I would recommend starting with the appropriate Special page:
[[Special:MostLinkedCategories]]
That generates the 500 categories with most members and is refreshed
every few days.

If that wasn't enough, I would do:

 SELECT max(cat_id) FROM category;

to get the last cat_id, and then, in small, manageable transactions, run:

 SELECT page_title, cat_pages, cat_subcats, page_id
 FROM category JOIN page ON category.cat_title = page.page_title AND
page.page_namespace = 14
 LEFT JOIN langlinks ON page_id = ll_from and ll_lang = 'fa'
 WHERE cat_id BETWEEN ? and ? and cat_pages > 500 and ll_from is null;


where the first 2 "?" are small ranges to cover, in small increments
(e.g. 10000 records each time), between 1 and max(cat_id). Change
"cat_pages > 500" to whatever fits you.

Double check the where/on s, not 100% sure they are right, but it
should be close to what you want.

On Wed, Mar 9, 2016 at 4:38 AM, John <phoenixoverride at gmail.com> wrote:
> I don't have it handy but category membership counts are stored in the db
> use that instead of a count()
>
>
> On Tuesday, March 8, 2016, Huji Lee <huji.huji at gmail.com> wrote:
>>
>> I am trying to find a list of those categories on EN WP that are highly
>> used (more than 100 subpages or subcats) but don't have a counterpart in FA
>> WP. The query I used is shown below and also on
>> https://quarry.wmflabs.org/query/7943 and is extremely slow.
>>
>> Any thoughts on how to make it more efficient?
>>
>> select page_title, ll_title, count(cl_to) as CNT
>> from page
>> left join langlinks
>> on ll_from = page_id
>> and ll_lang = 'fa'
>> join categorylinks
>> on page_title = cl_to
>> where page_namespace = 14
>> and ll_from is null
>> group by cl_to, ll_title
>> having CNT > 100
>> order by CNT desc;
>>
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>



-- 
Jaime Crespo
<http://wikimedia.org>



More information about the Labs-l mailing list