[Wikitech-l] Optimizing and tuning

1 Feb 2003


      There are two very good IRC channels on irc.freenode.net:
#php and #mysql
There are lots of knowledgeable people there who are eager to help open  
source projects, many have already heard of Wikipedia. I learned a lot  
from just being there for half an hour or so.
Here are some things we need to look into:
1) Composite indexes and index use in general. I do not know which indexes  
are currently used on the live Wikipedia. However, after the *default*  
database generation script, there is just a single composite index, namely  
in the table watchlist. All other indexes are on a single field. If I  
understand correctly, this means that when we do a query with multiple  
comparisons, e.g. WHERE foo=bar AND baz=bat, only one index is used. At  
least, that is what this article claims:
http://www.linux-mag.com/cgi-bin/printer.pl?issue=2001-06&article=mysql
There are also a couple of tables with no indexes (including ARCHIVE,  
which may cause Special:Undelete to create high server load) and some  
unexplainable ones (site_stats has an index, but only one row). We really  
need to clean up our indexes. I can't help much with this without server  
access because I don't know if the table structures have been altered.
2) The HANDLER command. HANDLER, which works with both InnoDB and MyISAM,
provides low-level access to tables *without locking*. You get reads, but  
you don't get consistency, which may cause problems when you're dealing  
with critical, heavily modified data. But it seems perfect for our archive  
stuff. The HANDLER syntax is explained here:
http://www.mysql.com/doc/en/HANDLER.html
We definitely should look into this.
3) Upgrading to MySQL4. MySQL4 supports query caching (also subqueries,  
but I haven't looked into that), which means that heavily queried pages  
will load faster. When someone mentioned query caching, a lot of other  
people made comments like "query caching rocks", so I assume it provides  
quite some performance benefits in practice.
4) Tuning my.cnf. According to one IRC resident, upping table_cache can  
greatly increase performance with lots of UPDATEs. If we create new  
indexes, we may also need to raise index_cache (there's a formula to  
calculate your index cache efficiency, currently we are at 99.75%, which  
is pretty good).
5) Caching our article index. I have mentioned this idea before, I think:  
We could just keep our entire article index in memory *persistently*  
(currently the linkcache is filled for each request), which should make  
link lookup almost instant. There are several ways to accomplish this:
- We could ask one of our resident C programmers to help. There's a  
specified interface to access persistent resources in MySQL, which is used  
by functions such as mysql_pconnect. This page describes how it works:
http://www.php-center.de/en-html-manual/phpdevel.html
- We could use session variables for the purpose. There is supposed to be  
an option to keep session stuff in shared memory. This might help with  
other caching problems as well.
- We could just put a database on a ramdisk for this purpose.
The order of this list reflects the priority at which I think these  
different questions need to be addressed. Getting our indexes to work  
properly should IMHO be of greatest importance. Even if EXPLAIN shows the  
use of an index, we may frequently still require large table scans because  
we do not use composite indexes.
Regards,
Erik

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Optimizing and tuning