I think that the problem, at least with this specific query, is with the query structure. page_namespace and rev_user are both already indexed in the page and revision tables, but the query planner doesn't optimize the join "correctly" (maybe coz the page_namespace index is long). Nevertheless, the query is slow because there isn't a user_page table and you have to dig in the huge revision table, so if you explicitly dump all the rev_page data into a derived table that might help your query speed.
For example, try this alternate version:
SELECT COUNT(*), p.page_namespace FROM page p JOIN (SELECT DISTINCT rev_page FROM revision WHERE rev_user = XXXXX) r ON (r.rev_page = p.page_id) GROUP BY p.page_namespace ORDER BY p.page_namespace
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 14202 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: page type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: r.rev_page rows: 1 Extra: *************************** 3. row *************************** id: 2 select_type: DERIVED table: revision type: ref possible_keys: user_timestamp key: user_timestamp key_len: 4 ref: rows: 79640 Extra: Using where; Using index; Using temporary
-darren
On 15 September 2008, at 2:18 PM, Aryeh Gregor wrote:
On Mon, Sep 15, 2008 at 1:48 PM, Victor Vasiliev vasilvv@gmail.com wrote:
Is it possible to locally create some indexes specific for toolserver? As for now, it will help to seriously speed up most tools. E.g. index (rev_user,rev_page) will speed up all editcounters, which usually uses query like SELECT COUNT(*), page_namespace FROM page JOIN revision ON page_id = rev_page WHERE rev_user = 37880 GROUP BY page_namespace ORDER BY page_namespace;
Why would that index speed up the access noticeably? There's already an index on (rev_user, rev_timestamp) (assuming that the toolserver has stock indexes), and rev_page is in the leaf nodes.
If optimization is needed here, a root should install the microslow patch and do an analysis of what's taking up the most time, and see if those can be indexed or otherwise improved. At any rate, an ALTER TABLE would require considerable downtime, since as far as I know there's only one toolserver database server, so you can't just throw all the load onto the other slaves while you take one out of rotation. (Or are there multiples these days?)
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
explain SELECT COUNT(*), p.page_namespace FROM page p JOIN revision r ON (p.page_id = r.rev_page) WHERE r.rev_user = 37880 GROUP BY p.page_namespace ORDER BY p.page_namespace \G drh08@nightshade:~$ sql enwiki_p < test.sql *************************** 1. row *************************** id: 1 select_type: SIMPLE table: revision type: ref possible_keys: PRIMARY,page_timestamp,user_timestamp key: user_timestamp key_len: 4 ref: const rows: 1 Extra: Using index; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: page type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: enwiki.revision.rev_page rows: 1 Extra: