The Jeremy Zawodny book?:)
Temporary table use by server, per second:
total on disk Ariel 1.06 0.204 Suda 0.164 0.162 Bacon 0.13 0.13 (all)
So, temporary tables which use the disks aren't really that frequent.
Suda uses its smaller drive pair for temporary and log tables. The database servers have 6 hot swap bays. Taking one drive for temporary tables would cost two drives in the main array. I don't think it's worth doing. I do recall some discussion with Silicon Mechanics about an internal bay which can hold a SATA drive and adding a 36GB 10K RPM SATA drive for temporary tables might be worth doing. Or for Suda, the 36GB drive not currently in use. But they are infrequent enough that I'm not really worried.
All of the database servers have write buffering turned on and it makes a significant difference. All are ordered with battery backup. Possible exception is Bacon where I need to check to see if it has the battery and ask Jimbo to order that module (about US$100) if not.
The indexes are fair to good for most queries but not good enough for the special pages. I'm deliberately waiting on suggesting big changes there until after the next release of the software. Getting the schema changes for cur/old is far more important and I don't want to distract people from them. See the meta page. I plan to script the special page updates first to prove what is faster, then submit those scripted versions for PHP implementation for the live site. This will involve temporary tables and/or summary tables, to get perfect indexes and to avoid using long duration read queries on cur.
A few queries force the right index. Generally MySQL gets it right - only one case where it doesn't now. See the Querybane page for that one and for the queries which are problematic enough to need special handling. The cur/old schema change and search query deal with most.
One query which needs work is the category list. See link below which explains the problems with it. Needs a periodically generated summary table. Querycache can't currently do the job because it lacks a field for the initial limit value.
URLs: Querybane: http://wp.wikidev.net/Querybane Category list: http://wp.wikidev.net/Category_list_query
user_Jamesday wrote:
The Jeremy Zawodny book?:)
The very one :-)
Temporary table use by server, per second:
total on disk
Ariel 1.06 0.204 Suda 0.164 0.162 Bacon 0.13 0.13 (all)
So, temporary tables which use the disks aren't really that frequent.
Suda uses its smaller drive pair for temporary and log tables. The database servers have 6 hot swap bays. Taking one drive for temporary tables would cost two drives in the main array. I don't think it's worth doing. I do recall some discussion with Silicon Mechanics about an internal bay which can hold a SATA drive and adding a 36GB 10K RPM SATA drive for temporary tables might be worth doing. Or for Suda, the 36GB drive not currently in use. But they are infrequent enough that I'm not really worried.
OK.
All of the database servers have write buffering turned on and it makes a significant difference. All are ordered with battery backup. Possible exception is Bacon where I need to check to see if it has the battery and ask Jimbo to order that module (about US$100) if not.
Cool.
The indexes are fair to good for most queries but not good enough for the special pages. I'm deliberately waiting on suggesting big changes there until after the next release of the software. Getting the schema changes for cur/old is far more important and I don't want to distract people from them. See the meta page. I plan to script the special page updates first to prove what is faster, then submit those scripted versions for PHP implementation for the live site. This will involve temporary tables and/or summary tables, to get perfect indexes and to avoid using long duration read queries on cur.
A few queries force the right index. Generally MySQL gets it right - only one case where it doesn't now. See the Querybane page for that one and for the queries which are problematic enough to need special handling. The cur/old schema change and search query deal with most.
We should of course wait for the new DB schema to be set up before fiddling with the indices. :-) What I meant was that currently, we have (for example) a (cur_title) and a combined (cur_namespace,cur_title) index, last time I checked. A single (cur_title,cur_namespace) index (note the order!) would be sorted for cur_title and should suffice. That would save memory, for both RAM and disk. I just notice that we also have a (cur_namespace,cur_title,inverse_timestamp) index. Triple indexing of cur_title is a little overkill, IMHO.
One query which needs work is the category list. See link below which explains the problems with it. Needs a periodically generated summary table. Querycache can't currently do the job because it lacks a field for the initial limit value.
Maybe we should just get rid of the alphabetical list display and replace it with a tree-like display. That will get rid of the LIMIT,OFFSET thing - we can just query for subcategories of a category, using cur_namespace, cur_id , cl_from and cl_to.
Magnus
Magnus Manske wrote:
One query which needs work is the category list. See link below which explains the problems with it. Needs a periodically generated summary table. Querycache can't currently do the job because it lacks a field for the initial limit value.
Maybe we should just get rid of the alphabetical list display and replace it with a tree-like display. That will get rid of the LIMIT,OFFSET thing - we can just query for subcategories of a category, using cur_namespace, cur_id , cl_from and cl_to.
Getting rid of the alphatbetical list is a bad idea, if anything it would be great if the alphabetical sort was improved so that it would include uppercase and undercase and special characters together eg: sSÅ¡ etc. Creating a tree like content for a tree-like display is a wikipedia only feature and would detract from its usefullness for wiktionary.
Thanks, GerardM
wikitech-l@lists.wikimedia.org