Slow query log from Friday morning to present: http://www.wikipedia.org/tools/slowquerylog.gz
-- brion vibber (brion @ pobox.com)
Slow query log from Friday morning to present: http://www.wikipedia.org/tools/slowquerylog.gz
I notice more slow queries than you'd expect on user_newtalk. It's a bone-simple query, fully indexed: EXPLAIN SELECT 1 FROM user_newtalk WHERE user_ip='some_ip_address'; +--------------+------+---------------+---------+---------+-------+------+-------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +--------------+------+---------------+---------+---------+-------+------+-------------------------+ | user_newtalk | ref | user_ip | user_ip | 40 | const | 1 | where used; Using index | +--------------+------+---------------+---------+---------+-------+------+-------------------------+
or:
EXPLAIN SELECT 1 FROM user_newtalk WHERE user_id=some_user_id_number; +--------------+------+---------------+---------+---------+-------+------+-------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +--------------+------+---------------+---------+---------+-------+------+-------------------------+ | user_newtalk | ref | user_id | user_id | 4 | const | 1 | where used; Using index | +--------------+------+---------------+---------+---------+-------+------+-------------------------+
One of these two queries happens on *every* page view; the table itself is quite small (a few dozen rows generally), and 99% of the time queries will turn up no rows. It's only altered when a user talk page is changed, or viewed by its owner.
CREATE TABLE `user_newtalk` ( `user_id` int(5) NOT NULL default '0', `user_ip` varchar(40) NOT NULL default '', KEY `user_ip` (`user_ip`), KEY `user_id` (`user_id`) ) TYPE=MyISAM
I'm assuming that the sometime slowness here is a side effect of other things being slow somehow.
-- brion vibber (brion @ pobox.com)
Thanks for the slow query log - I don't have the db in front of me now, but I'll take a look later. It may be worthwhile changing user_newtalk to InnoDb to see what happens. It's currently MyISAM, and if the talk pages are being updated frequently table locking may be a problem here, so row level locking may be more efficient.
Quoting Brion Vibber brion@pobox.com:
Slow query log from Friday morning to present: http://www.wikipedia.org/tools/slowquerylog.gz
I notice more slow queries than you'd expect on user_newtalk. It's a bone-simple query, fully indexed: EXPLAIN SELECT 1 FROM user_newtalk WHERE user_ip='some_ip_address'; +--------------+------+---------------+---------+---------+-------+------+----
---------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra | +--------------+------+---------------+---------+---------+-------+------+----
---------------------+
| user_newtalk | ref | user_ip | user_ip | 40 | const | 1 | where used; Using index | +--------------+------+---------------+---------+---------+-------+------+----
---------------------+
or:
EXPLAIN SELECT 1 FROM user_newtalk WHERE user_id=some_user_id_number; +--------------+------+---------------+---------+---------+-------+------+----
---------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra | +--------------+------+---------------+---------+---------+-------+------+----
---------------------+
| user_newtalk | ref | user_id | user_id | 4 | const | 1 | where used; Using index | +--------------+------+---------------+---------+---------+-------+------+----
---------------------+
One of these two queries happens on *every* page view; the table itself is quite small (a few dozen rows generally), and 99% of the time queries will turn up no rows. It's only altered when a user talk page is changed, or viewed by its owner.
CREATE TABLE `user_newtalk` ( `user_id` int(5) NOT NULL default '0', `user_ip` varchar(40) NOT NULL default '', KEY `user_ip` (`user_ip`), KEY `user_id` (`user_id`) ) TYPE=MyISAM
I'm assuming that the sometime slowness here is a side effect of other things being slow somehow.
-- brion vibber (brion @ pobox.com)
On sab, 2003-02-08 at 17:25, Ian Gilfillan wrote:
Thanks for the slow query log - I don't have the db in front of me now, but I'll take a look later.
Great, thanks for your help!
It may be worthwhile changing user_newtalk to InnoDb to see what happens. It's currently MyISAM, and if the talk pages are being updated frequently table locking may be a problem here, so row level locking may be more efficient.
It's updated very infrequently, and the slow query log lists the time spent as being in the query, not in waiting for locks. That's why I'm so stumped...
My recollection of looking at the SHOW PROCESS list during peak times is that these queries were usually sitting in "Statistics" or "Opening tables" state. (What _is_ "Statistics"? The online manual doesn't seem to explain it.)
It occurs to me that our table cache is only 256; we've got a max_connections of 250, and have a fair number of 2-table joins (searching, checking status of linked pages on view, recentchanges and watchlist) and, taking all wikis into account, some 400+ tables.
Lesse... | Open_tables | 256 | | Opened_tables | 6671 |
Yikes! I'm going to up it to about 1000 and see what that does...
-- brion vibber (brion @ pobox.com)
On sab, 2003-02-08 at 17:46, Brion Vibber wrote:
My recollection of looking at the SHOW PROCESS list during peak times is that these queries were usually sitting in "Statistics" or "Opening tables" state. (What _is_ "Statistics"? The online manual doesn't seem to explain it.)
I've managed to catch it in the act; attached are two snapshots of "SHOW PROCESSLIST" taken about 45 seconds apart. A whooole bunch of threads are stuck on the user_newtalk query in "Statistics" state; after about a minute they fall apart and there are a bunch of cur queries in "Closing tables" and "Opening tables".
-- brion vibber (brion @ pobox.com)
On dim, 2003-02-09 at 19:23, Brion Vibber wrote:
I've managed to catch it in the act; attached are two snapshots of "SHOW PROCESSLIST" taken about 45 seconds apart. A whooole bunch of threads are stuck on the user_newtalk query in "Statistics" state; after about a minute they fall apart and there are a bunch of cur queries in "Closing tables" and "Opening tables".
The only information I've dragged up so far on the 'statistics' state is this thread: http://forums.devshed.com/archive/4/2002/11/4/46232
Apparently it's something to do with joins... or indexes... or finding which indexes which are sometimes relevant to joins... Anyway, I'm not sure how much benefit an index has on a table with 75 rows and only constant SELECTs, 99% of which are checking for values that *don't* appear in the table. I've dumped the indexes on user_newtalk for now, see what that does...
-- brion vibber (brion @ pobox.com)
On dim, 2003-02-09 at 19:23, Brion Vibber wrote:
I've managed to catch it in the act; attached are two snapshots of "SHOW PROCESSLIST" taken about 45 seconds apart. A whooole bunch of threads are stuck on the user_newtalk query in "Statistics" state; after about a minute they fall apart and there are a bunch of cur queries in "Closing tables" and "Opening tables".
The only information I've dragged up so far on the 'statistics' state is this thread: http://forums.devshed.com/archive/4/2002/11/4/46232
Apparently it's something to do with joins... or indexes... or finding which indexes which are sometimes relevant to joins... Anyway, I'm not sure how much benefit an index has on a table with 75 rows and only constant SELECTs, 99% of which are checking for values that *don't* appear in the table. I've dumped the indexes on user_newtalk for now, see what that does...
Personally, I doubt that it is at all related to user_newtalk. If you check the slow query log, you will notice that we have queries that take up to
1000 seconds. While such a killer query runs, my guess is that everything else
slows down. Since user_newtalk is queried on every pageview (including recent changes etc., and for anons and non-anons), it's bound to show up in the slow query log more often than other queries. Even then, it is predictably the fastest among the slow queries. Similarly, simple queries like looking up a user by ID occasionally show up in slow query log, probably again due to general slowdowns of the server.
Is there a way to specify how long a query can take? Most queries are not critical, and maybe we should just timeout after 20 seconds and show an error message (which, technically, should *never* come up, as queries really shouldn't take that long).
Regards,
Erik
Aargh, this is frustrating - I only seem to get 5 minutes to look before I'm rushing off somewhere else.
I'd tend to agree with Erik that the user_newtalk is a symptom.
I'd just got looking at the log when I'm dragged away, but a few things that may help... - it's not the volume of queries, it's often the query that starts the backing up. So look at the first query in the log, and the first queries after a period of time running OK. Two that I saw this way are:
SELECT cur_id,cur_namespace,cur_title,cur_text FROM cur,searchindex WHERE cur_id=si_page AND (MATCH (si_text) AGAINST ('math characters') AND (cur_is_redirect=0) ) AND (cur_namespace=0) LIMIT 0, 20;
and
SELECT cur_id,cur_namespace,cur_title,cur_text FROM cur,searchindex WHERE cur_id=si_page AND (MATCH (si_text) AGAINST ('math characters') AND (cur_is_redirect=0) ) AND (cur_namespace=0) LIMIT 0, 20;
You can also usually spot locking problems by seeing a whole batch of slow queries that finish around the same time - look for the query causing the locking problem..
Other things I've spotted, upping the table cache is a good idea - dropping the index from user_newtalk probably will make little difference if the table is so small, but shouldn't help (stranger things have happened though)
And now I really have to run!!
----- Original Message ----- From: "Erik Moeller" erik_moeller@gmx.de To: wikitech-l@wikipedia.org Sent: Monday, February 10, 2003 5:28 PM Subject: Re: [Wikitech-l] Slow query log
On dim, 2003-02-09 at 19:23, Brion Vibber wrote:
I've managed to catch it in the act; attached are two snapshots of "SHOW PROCESSLIST" taken about 45 seconds apart. A whooole bunch of threads are stuck on the user_newtalk query in "Statistics" state; after about a minute they fall apart and there are a bunch of cur queries in "Closing tables" and "Opening tables".
The only information I've dragged up so far on the 'statistics' state is this thread: http://forums.devshed.com/archive/4/2002/11/4/46232
Apparently it's something to do with joins... or indexes... or finding which indexes which are sometimes relevant to joins... Anyway, I'm not sure how much benefit an index has on a table with 75 rows and only constant SELECTs, 99% of which are checking for values that *don't* appear in the table. I've dumped the indexes on user_newtalk for now, see what that does...
Personally, I doubt that it is at all related to user_newtalk. If you check the slow query log, you will notice that we have queries that take up to
1000 seconds. While such a killer query runs, my guess is that everything
else slows down. Since user_newtalk is queried on every pageview (including recent changes etc., and for anons and non-anons), it's bound to show up in the slow query log more often than other queries. Even then, it is predictably the fastest among the slow queries. Similarly, simple queries like looking up a user by ID occasionally show up in slow query log, probably again due to general slowdowns of the server.
Is there a way to specify how long a query can take? Most queries are not critical, and maybe we should just timeout after 20 seconds and show an error message (which, technically, should *never* come up, as queries really shouldn't take that long).
Regards,
Erik
-- +++ GMX - Mail, Messaging & more http://www.gmx.net +++ NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!
_______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l
When does the slow query log place an entry for the slow query? Based on what is in the log, I would guess that the log is written to after the query finishes. So, if the log has some entries like:
select .... took 130 seconds update .... took 60 seconds insert .... took 513 seconds
It would be reasonable to assume that the insert was the statement that started the problem. Perhaps it would be usefull to write a quick script to reorder the slow query log, sorting statements according to when the startd execution, rather than when they finished. It might be easier to make sense of it all that way
Let me know if I am way off base here.
Jason
Ian Gilfillan wrote:
- it's not the volume of queries, it's often the query that starts the
backing up. So look at the first query in the log, and the first queries after a period of time running OK. Two that I saw this way are:
On sab, 2003-02-08 at 14:26, Brion Vibber wrote:
Slow query log from Friday morning to present: http://www.wikipedia.org/tools/slowquerylog.gz
Now extends through, well, the present present.
Remaining things that need to be made less ugly slow are: recentchanges, search, watchlist.
Recentchanges looks okay, except it should benefit from the inverse timestamp trick (or Mysql4 upgrade); this would allow an index sort.
Search has been discussed elsewhere...
The watchlist:
SELECT DISTINCT cur_id,cur_namespace,cur_title,cur_comment, cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new FROM cur,watchlist WHERE wl_user=7457 AND wl_title=cur_title AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1) ORDER BY inverse_timestamp LIMIT 100;
+-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | watchlist | ref | wl_user | wl_user | 4 | const | 1730 | where used; Using index; Using temporary; Using filesort | | cur | ref | cur_namespace,cur_title,name_title_timestamp | cur_title | 255 | watchlist.wl_title | 1 | where used | +-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
Explain doesn't give any different results if the namespace+1 is taken out; so adding near-dupe rows to cover talk pages wouldn't help.
I'm not quite sure how to match up the indexes right. Note that watchlist has a composite index on wl_user, wl_namespace, and wl_title... but I don't know if it can match those up if part 1 of the index is a constant, and parts 2 and 3 match 1 and 2 of the other table's index.
Some prolific users have three thousand or more titles in their watchlist; the total number of distinct watched pages on the en.wiki is 30409.
Is there any way we could do this without temporary tables & filesorts & whatnot?
-- brion vibber (brion @ pobox.com)
On Tue, Feb 11, 2003 at 12:35:37AM -0800, Brion Vibber wrote:
On sab, 2003-02-08 at 14:26, Brion Vibber wrote:
The watchlist:
SELECT DISTINCT cur_id,cur_namespace,cur_title,cur_comment, cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new FROM cur,watchlist WHERE wl_user=7457 AND wl_title=cur_title AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1) ORDER BY inverse_timestamp LIMIT 100;
+-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+ | watchlist | ref | wl_user | wl_user | 4 | const | 1730 | where used; Using index; Using temporary; Using filesort | | cur | ref | cur_namespace,cur_title,name_title_timestamp | cur_title | 255 | watchlist.wl_title | 1 | where used | +-----------+------+----------------------------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
Explain doesn't give any different results if the namespace+1 is taken out; so adding near-dupe rows to cover talk pages wouldn't help.
Strange. Did you force it to use the index Name_title_timestamp with USE? MySQL sometimes chooses a smaller index than it should.
I'm not quite sure how to match up the indexes right. Note that watchlist has a composite index on wl_user, wl_namespace, and wl_title... but I don't know if it can match those up if part 1 of the index is a constant, and parts 2 and 3 match 1 and 2 of the other table's index.
They don't have to match. All that is required is that the columns on which it joins watchlist with cur are a prefix of some index on cur. So without the namespace+1 part the name_title_timestamp index should do, and with probably also (but I'm not sure because I don't know how it handles the OR). As always, EXPLAIN will tell you if it uses the right indices or not.
Some prolific users have three thousand or more titles in their watchlist; the total number of distinct watched pages on the en.wiki is 30409.
Is there any way we could do this without temporary tables & filesorts & whatnot?
Yes, there is, you could add an extra inverse_timestamp column to the watchlist that (very redundantly) stores the inverse_timestamp of the watched page. Obviously this would have to be updated each time the page is updated (and when the page is added to the watchlist). If you then have an index on (wl_user, inv_timestamp) and make sure MySQL uses it then it won't have to sort.
-- Jan Hidders
wikitech-l@lists.wikimedia.org