[Labs-l] Query optimization

Sat Mar 12 17:20:55 UTC 2016

On second look, it would be ideal for me to get the counts of "articles"
(not pages) in the category, something that cannot be achieved used the
*category* table. I am going back to my old query agian.. the slow one.

On Sat, Mar 12, 2016 at 11:52 AM, Huji Lee <huji.huji at gmail.com> wrote:

> Thanks John. I avoided category table because it is not 100% consistent
> with categorylinks, but they join you used takes care of that. This is an
> excellent solution to my particular problem.
>
> On Thu, Mar 10, 2016 at 6:48 AM, John <phoenixoverride at gmail.com> wrote:
>
>> Take a look at:
>>
>>  select * from category left join page on page_title = cat_title where
>> page_namespace = 14 and cat_pages > 500 and 'fa' NOT IN (SELECT ll_lang
>> FROM langlinks where ll_from = page_id)
>>
>> It works very quick and gives what you need.
>>
>>
>> On Thu, Mar 10, 2016 at 4:09 AM, Jaime Crespo <jcrespo at wikimedia.org>
>> wrote:
>>
>>> On Thu, Mar 10, 2016 at 9:24 AM, Purodha Blissenbach
>>> <purodha at blissenbach.org> wrote:
>>> > I do not know how replication works, but I did and do assume, it
>>> > replicates indexes. If so, the online manual starting at
>>> > https://www.mediawiki.org/wiki/Manual:Database_layout
>>> > may help you.
>>>
>>> Correct. There are some cases where the indexing doesn't work (it
>>> doesn't make sense to use certain indexes that are partially filtered
>>> on the view). In those cases, new views are created, such as
>>> "revision_userindex", etc.
>>>
>>> Maybe we can add a "columns" table to information_schema_p, or the
>>> create tables to the "tables" table, where the original table
>>> description is explained. File a ticket if that would be useful.
>>>
>>> If there are several users that perform common slow analytics, file a
>>> ticket and we could, if technically possible, do them in advance so
>>> they do not have to be done every single time. Another option is to
>>> replicate to a column-based engine, but I am not sure if people would
>>> know how to take most advantage of those (where aggregations are
>>> cheap).
>>>
>>> > Shouldn't a simple join on indexed fields + a simple aggregation on an
>>> indexed field be relatively cheap
>>>
>>> No, aggregation on mysql, on standard engines, is never cheap. Indexes
>>> do not help with that, they still need to count every single row (so
>>> you are creating a filter, of a sort, of a full tablescan of one of
>>> the largest tables per wiki). Why doing that when the count is already
>>> aggregated?
>>>
>>> I recently delivered a session on query optimization for mysql and
>>> mariadb, using wikimedia and osm as examples [warning, external site]
>>> <https://github.com/jynus/query-optimization>, maybe that would be
>>> helpful?
>>>
>>> --
>>> Jaime Crespo
>>> <http://wikimedia.org>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160312/772fab1d/attachment.html>