It's the sorts on some of our queries that are so damn slow. A few hundred or a couple thousand rows can take *a couple minutes* for it to sort on the timestamp field.
I've just put in a hacky manual sort (in CVS) for the page history; all rows are returned without sorting (so in ID order, usually but not always chronological) and then output in reverse order. It's *immensely* faster. Try loading up the history on the Village Pump or Votes for Deletion, and watch it actually come up! Kinda nice. :)
I also took the opportunity to drop in the see next X links, so you don't have to load all 2500+ items at once on your (well, my) 56k modem. (These are used pretty generally, and the code really ought to be moved out of SearchEngine.php and into globalfunctions...)
Of course, it would be *nicer* to do the ORDER BY and LIMIT in mysql.
The old query: EXPLAIN SELECT old_id,old_namespace,old_title,old_user, -> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old -> WHERE old_namespace=4 AND -> old_title='Village_pump' -> ORDER BY old_timestamp DESC; +-------+------+---------------+---------------+---------+-------------+------+----------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+---------------+---------+-------------+------+----------------------------+ | old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used; Using filesort | +-------+------+---------------+---------------+---------+-------------+------+----------------------------+
The new query: EXPLAIN SELECT old_id,old_namespace,old_title,old_user, -> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old -> WHERE old_namespace=4 AND -> old_title='Village_pump'; +-------+------+---------------+---------------+---------+-------------+------+------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+---------------+---------+-------------+------+------------+ | old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used | +-------+------+---------------+---------------+---------+-------------+------+------------+
Now, in an ideal world, it could do the sorting based on that handy index it has on old_timestamp. In the topsy-turvy world of MySQL, however, only the index used for the WHERE clause means anything. Other ways to speed it up?
http://www.mysql.com/doc/en/ORDER_BY_optimisation.html
Oh, yes, and I changed the index from old_title to a combo on old_namespace and old_title. This should cut down the number of rows a little bit where a page and its talk page are both long; we never WHERE the two columns separately on the old table, only together.
SHOW INDEX FROM old; +-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+---------+ | old | 0 | old_id | 1 | old_id | A | 744949 | NULL | NULL | | | old | 1 | old_timestamp | 1 | old_timestamp | A | 744949 | NULL | NULL | | | old | 1 | old_user | 1 | old_user | A | 14 | NULL | NULL | | | old | 1 | old_user_text | 1 | old_user_text | A | 74494 | NULL | NULL | | | old | 1 | old_namespace | 1 | old_namespace | A | 14 | NULL | NULL | | | old | 1 | old_namespace | 2 | old_title | A | 148989 | NULL | NULL | | +-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+---------+
And, since I was torturing things with slowness anyway, old is now InnoDB. Whoop.
-- brion vibber (brion @ pobox.com)
More ugly hacks... User contribs now checks the recentchanges table first, then moves on to 'cur' and 'old' if it hasn't hit its quota yet. (See in CVS.)
This makes it actually possible to look at contribs lists for very busy users. Yay!
-- brion vibber (brion @ pobox.com)
On Mon, 2003-02-03 at 11:03, Brion Vibber wrote:
More ugly hacks... User contribs now checks the recentchanges table first, then moves on to 'cur' and 'old' if it hasn't hit its quota yet. (See in CVS.)
This makes it actually possible to look at contribs lists for very busy users. Yay!
Cool. I think there should be a "My edits" link in the sidebar now that it's reasonably fast. It's too useful to have it hidden like that. Any objections?
It would also be neat to use the same paging style as in the history.
Regards,
Erik
I have fixed the contri/email links to not incorrectly show up on user subpages or pages of non-existent users. Now a check is done if the user exists or if it's an anonymous user page before the links are added.
To make this a bit faster (string comparison against user_name) I have added an index to the user table, which should already be fairly large on the en.wiki. The "patch" for that one is in maintenance as usual. I suggest running it on en at least.
I'd like to show the links also on subpages, but to do this properly, we should make sure that there are no user names with slahes in them. Right now slashes in usernames are allowed. Do we have any slash-users?
I have also added the edit comment to the user contribution list, so you can see what a specific edit was about, provided the user has specified a reason.
As always, changes in CVS, not live yet.
Regards,
Erik
On Sun, Feb 02, 2003 at 11:49:05PM -0800, Brion Vibber wrote:
I've just put in a hacky manual sort (in CVS) for the page history; all rows are returned without sorting (so in ID order, usually but not always chronological) and then output in reverse order. It's *immensely* faster. Try loading up the history on the Village Pump or Votes for Deletion, and watch it actually come up! Kinda nice. :)
I also took the opportunity to drop in the see next X links, so you don't have to load all 2500+ items at once on your (well, my) 56k modem. (These are used pretty generally, and the code really ought to be moved out of SearchEngine.php and into globalfunctions...)
Of course, it would be *nicer* to do the ORDER BY and LIMIT in mysql.
The old query: EXPLAIN SELECT old_id,old_namespace,old_title,old_user, -> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old -> WHERE old_namespace=4 AND -> old_title='Village_pump' -> ORDER BY old_timestamp DESC; +-------+------+---------------+---------------+---------+-------------+------+----------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+---------------+---------+-------------+------+----------------------------+ | old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used; Using filesort | +-------+------+---------------+---------------+---------+-------------+------+----------------------------+
The new query: EXPLAIN SELECT old_id,old_namespace,old_title,old_user, -> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old -> WHERE old_namespace=4 AND -> old_title='Village_pump'; +-------+------+---------------+---------------+---------+-------------+------+------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+---------------+---------+-------------+------+------------+ | old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used | +-------+------+---------------+---------------+---------+-------------+------+------------+
Now, in an ideal world, it could do the sorting based on that handy index it has on old_timestamp. In the topsy-turvy world of MySQL, however, only the index used for the WHERE clause means anything. Other ways to speed it up?
Actually, that is not MySQL's fault but just a consequence of how indices work; the only way in which an index can help with sorting is that it in some sense presorts the records for you, but that is irrelevant if you also use other indices (as you do here) because they will order the records in a different way.
Anyway, there is actually a "trick" that you can use here to get always the chronological order and that is to define an in index on (old_namespace, old_title, old_timestamp) and forcing MySQL to use that index as in (assuming that this indes is named old_namespace_2):
SELECT old_id, old_timestamp FROM old USE INDEX (old_namespace_2) WHERE old_namespace=0 AND old_title='Village_pump' ORDER BY old_timestamp;
then EXPLAIN will give you:
+-------+------+-------------------------------+-----------------+---------+-------------+------+------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+-------------------------------+-----------------+---------+-------------+------+------------+ | old | ref | old_namespace,old_namespace_2 | old_namespace_2 | 256 | const,const | 29 | where used | +-------+------+-------------------------------+-----------------+---------+-------------+------+------------+
As you can see there is no sorting and that is because it uses the fact that the index already presorts the records for you. Because MySQL knows that this index can also be used as an index on (namespace, title) and even on (namespace) you can drop the indices on those combinations. I'm not sure if you can actually omit the USE but I know that if the other index is still around on (namespace, title) it will usually pick that index, and then it still has to sort.
You can use this method in most cases where we sort on timestamp, and you might even consider storing the timestamps negated (unfortunately MySQL doesn't have descending indices) in an extra redundant column and use that column for the index. Then you wouldn't even have to reverse the list and could efficiently do a query that selects that last 100 or so without reading the full list into memory.
-- Jan Hidders
On lun, 2003-02-03 at 16:59, Jan Hidders wrote:
Anyway, there is actually a "trick" that you can use here to get always the chronological order and that is to define an in index on (old_namespace, old_title, old_timestamp) and forcing MySQL to use that index
Okay, finally got around to trying this (on test wiki). The EXPLAIN results are looking very promising -- it gets rid of the "using filesort" and is presumably sorting with the index... *but only if sorting forwards*.
If it were just the order the results come in, no big deal, but we need to grab the last X rows and I don't know of another way to do that than a reverse sort and LIMIT. No one wants to transfer all 15,000 of mav's edits from the db just to display the last 50! (Well, I suppose you could do two queries, one just to count, then one forward-sorted with LIMIT N-X,X. But, that's ugly and increases load on the db.)
MySQL 3.x doesn't support using an index to sort backwards. However, if I'm reading the docs correctly*, MySQL 4.x _does_. This may be a strong argument in favor of upgrading.
* http://www.mysql.com/doc/en/ORDER_BY_optimisation.html
-- brion vibber (brion @ pobox.com)
I can't access the CVS server right now, so I've attached the experimental index tweaks and sample queries for contribs and history.
-- brion vibber (brion @ pobox.com)
Contribs:
EXPLAIN SELECT old_namespace,old_title,old_timestamp,old_comment FROM old USE INDEX (usertext_timestamp) WHERE old_timestamp > '20030201000000' AND old_user_text='XXX' ORDER BY old_timestamp DESC LIMIT 50; | table | type | possible_keys | key | key_len | ref | rows | Extra | old | ref | old_timestamp,old_user_text | old_user_text | 255 | const | 1 | where used; Using filesort | | old | range | old_timestamp,usertext_timestamp | usertext_timestamp | 269 | NULL | 1 | where used; Using filesort |
EXPLAIN SELECT old_namespace,old_title,old_timestamp,old_comment FROM old USE INDEX (user_timestamp) WHERE old_timestamp > '20030201000000' AND old_user=1 ORDER BY old_timestamp DESC LIMIT 50; +-------+------+------------------------+----------+---------+-------+------+----------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+------------------------+----------+---------+-------+------+----------------------------+ | old | ref | old_timestamp,old_user | old_user | 4 | const | 1 | where used; Using filesort | +-------+------+------------------------+----------+---------+-------+------+----------------------------+ | old | range | old_timestamp,user_timestamp | user_timestamp | 18 | NULL | 1 | where used; Using filesort |
History:
EXPLAIN SELECT old_id,old_user, old_comment,old_user_text,old_timestamp,old_minor_edit FROM old USE INDEX (name_title_timestamp) WHERE old_namespace=0 AND old_title='Main Page' ORDER BY old_timestamp DESC LIMIT 50;
+-------+------+---------------+---------------+---------+-------------+------+----------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+---------------+---------+-------------+------+----------------------------+ | old | ref | old_namespace | old_namespace | 256 | const,const | 1 | where used; Using filesort | +-------+------+---------------+---------------+---------+-------------+------+----------------------------+ | old | ref | old_namespace,name_title_timestamp | old_namespace | 256 | const,const | 1 | where used; Using filesort |
wikitech-l@lists.wikimedia.org