[Labs-l] Query optimization

Thu Mar 10 09:09:50 UTC 2016

On Thu, Mar 10, 2016 at 9:24 AM, Purodha Blissenbach
<purodha at blissenbach.org> wrote:
> I do not know how replication works, but I did and do assume, it
> replicates indexes. If so, the online manual starting at
> https://www.mediawiki.org/wiki/Manual:Database_layout
> may help you.

Correct. There are some cases where the indexing doesn't work (it
doesn't make sense to use certain indexes that are partially filtered
on the view). In those cases, new views are created, such as
"revision_userindex", etc.

Maybe we can add a "columns" table to information_schema_p, or the
create tables to the "tables" table, where the original table
description is explained. File a ticket if that would be useful.

If there are several users that perform common slow analytics, file a
ticket and we could, if technically possible, do them in advance so
they do not have to be done every single time. Another option is to
replicate to a column-based engine, but I am not sure if people would
know how to take most advantage of those (where aggregations are
cheap).

> Shouldn't a simple join on indexed fields + a simple aggregation on an indexed field be relatively cheap

No, aggregation on mysql, on standard engines, is never cheap. Indexes
do not help with that, they still need to count every single row (so
you are creating a filter, of a sort, of a full tablescan of one of
the largest tables per wiki). Why doing that when the count is already
aggregated?

I recently delivered a session on query optimization for mysql and
mariadb, using wikimedia and osm as examples [warning, external site]
<https://github.com/jynus/query-optimization>, maybe that would be
helpful?

-- 
Jaime Crespo
<http://wikimedia.org>