Re: [Wikitech-l] Re: Optimizing and tuning

1 Feb 2003

On sab, 2003-02-01 at 07:12, Jan Hidders wrote:
...
  Sorry for replying to myself, but I just saw in CVS
that the SQL is
 something like this:

   SELECT HIGH_PRIORITY cur_id
   FROM cur
   WHERE cur_namespace=" . "{$ns} AND cur_title='" . wfStrencode( $t )
. "'";

 but if I look in buildTables.inc we have a separate index for namespace and
 title. That should really be a combined index and that holds for everywhere
 we do a look-up with namespace + title. Do we have those? 
Combined index might make sense. In most cases it won't significantly
cut down the number of rows, but for some (main page, current events,
user:maveric149) it could take out a couple hundred rows by splitting
the page and talk page.

Also, a vaguely related question on the watchlist query. Currently the
return-whole-watchlist query looks like:
SELECT DISTINCT
  cur_id,cur_namespace,cur_title,cur_comment,
  cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new
  FROM cur,watchlist
  WHERE wl_user={$uid} AND wl_title=cur_title
        AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1)
  ORDER BY cur_timestamp DESC {$dolimit}

I did an 'explain' using my uid. (I have 1370 titles in my watchlist;
the larger # of estimated returned rows is presumably due to the weird
InnoDB optimizer, which is known to return sometimes-wrong numbers of
rows.)

+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
| table     | type | possible_keys           | key       | key_len | ref                |
rows | Extra                   |
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
| watchlist | ref  | wl_user                 | wl_user   |       4 | const              |
3264 | where used; Using index; Using temporary; Using filesort |
| cur       | ref  | cur_namespace,cur_title | cur_title |     255 | watchlist.wl_title | 
  1 | where used                    |
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+

Is there a better way of doing this? We have a combined namespace/title
index on watchlist; but we want to pick up two variants in the
namespace. In most cases, just using the title as index on cur should
then be sufficient (it's relatively rare for titles to exist in multiple
groups of namespaces, mostly just in one namespace and the X_talk: next
to it).

...
  And I'm not sure if we really need the resulting
cur_id. If we don't, then
 indeed only the index is needed and that would really speed up the query. 
99% of the time, we don't; we're just checking for existence. It should
be sufficient to just have a boolean 'exists/doesn't exist' field. When
we're checking existence, is it most efficient to SELECT the indexed
field, or COUNT(*), or what?

-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Optimizing and tuning