[Labs-l] Database Indexing Doesn't seem to be working

Brad Jorsch (Anomie) bjorsch at wikimedia.org
Thu Nov 13 19:00:28 UTC 2014


On Wed, Nov 12, 2014 at 11:06 AM, Maximilian Doerr <
maximilian.doerr at gmail.com> wrote:

> My knowledge of MySQL is pathetic to be honest.  It’s definitely something
> I should familiarize myself with.  Could you explain that to me so a pickle
> could understand it? :/
>

Unfortunately, I don't speak pickle. But I'll try to clarify.


"enwiki_p.revision_userindex" is a view (along with everything else in
enwiki_p, I believe), which is basically a pretend table that's really
populated by a SELECT query (in this case, from "enwiki.revision"). You can
use "SHOW CREATE TABLE enwiki_p.revision_userindex;" to see that query if
you want. So when you select from the view, it basically rewrites your
query to incorporate that other select.

>
> +------+-------------+----------+--------+---------------------------------------+----------------+---------+--------------------------+-------+-----------------------------+
> | id   | select_type | table    | type   |
> possible_keys                         | key            | key_len |
> ref                      | rows  | Extra                       |
>
> +------+-------------+----------+--------+---------------------------------------+----------------+---------+--------------------------+-------+-----------------------------+
> |    1 | SIMPLE      | revision | ref    |
> PRIMARY,page_timestamp,user_timestamp | user_timestamp | 4       |
> const                    | 29186 | Using where; Using filesort |
> |    1 | SIMPLE      | page     | eq_ref |
> PRIMARY                               | PRIMARY        | 4       |
> enwiki.revision.rev_page |     1 |                             |
>
The "key" column here tells us that it is in fact using the user_timestamp
index to find the rows in the revision table. Which is good, indexing isn't
broken.

The "Extra" column tells us a few things.

   - It doesn't have "Using index", so it's having to actually fetch the
   row from the table instead of just getting what it needs from the index (in
   this case, that's because the view filters based on rev_deleted). That
   makes things slightly slower, especially if you're having to hit cold pages
   (i.e. blocks from the disk that aren't already cached in memory).
   - "Using where" tells us that it might to throw away some of the rows
   that it fetched, thanks to a WHERE clause (again, the rev_deleted check).
   - "Using filesort" means that it's going to load all the rows into
   memory (or worse, write them to a temp file) and then sort them to put them
   in ORDER BY order. This can get really slow if there are a lot of them. I
   don't know why it's deciding it needs to do this here, since the
   user_timestamp index should already be giving the rows in the asked-for
   order.

And then, of course, it's having to fetch rows from the page table as well,
which could again be hitting cold pages.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20141113/4ccc2afb/attachment.html>


More information about the Labs-l mailing list