I think that the problem, at least with this specific query, is with
the query structure. page_namespace and rev_user are both already
indexed in the page and revision tables, but the query planner doesn't
optimize the join "correctly" (maybe coz the page_namespace index is
long). Nevertheless, the query is slow because there isn't a user_page
table and you have to dig in the huge revision table, so if you
explicitly dump all the rev_page data into a derived table that might
help your query speed.
For example, try this alternate version:
SELECT COUNT(*), p.page_namespace
FROM page p
JOIN (SELECT DISTINCT rev_page FROM revision WHERE rev_user = XXXXX) r
ON (r.rev_page = p.page_id)
GROUP BY p.page_namespace
ORDER BY p.page_namespace
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 14202
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: page
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: r.rev_page
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: revision
type: ref
possible_keys: user_timestamp
key: user_timestamp
key_len: 4
ref:
rows: 79640
Extra: Using where; Using index; Using temporary
-darren
On 15 September 2008, at 2:18 PM, Aryeh Gregor wrote:
On Mon, Sep 15, 2008 at 1:48 PM, Victor Vasiliev
<vasilvv(a)gmail.com>
wrote:
Is it possible to locally create some indexes
specific for
toolserver?
As for now, it will help to seriously speed up most tools. E.g. index
(rev_user,rev_page) will speed up all editcounters, which usually
uses
query like
SELECT COUNT(*), page_namespace FROM page JOIN revision ON page_id =
rev_page WHERE rev_user = 37880 GROUP BY page_namespace ORDER BY
page_namespace;
Why would that index speed up the access noticeably? There's already
an index on (rev_user, rev_timestamp) (assuming that the toolserver
has stock indexes), and rev_page is in the leaf nodes.
If optimization is needed here, a root should install the microslow
patch and do an analysis of what's taking up the most time, and see if
those can be indexed or otherwise improved. At any rate, an ALTER
TABLE would require considerable downtime, since as far as I know
there's only one toolserver database server, so you can't just throw
all the load onto the other slaves while you take one out of rotation.
(Or are there multiples these days?)
_______________________________________________
Toolserver-l mailing list
Toolserver-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
explain
SELECT COUNT(*), p.page_namespace
FROM page p
JOIN revision r ON (p.page_id = r.rev_page)
WHERE r.rev_user = 37880
GROUP BY p.page_namespace
ORDER BY p.page_namespace
\G
drh08@nightshade:~$ sql enwiki_p < test.sql
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: revision
type: ref
possible_keys: PRIMARY,page_timestamp,user_timestamp
key: user_timestamp
key_len: 4
ref: const
rows: 1
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: page
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: enwiki.revision.rev_page
rows: 1
Extra: