On Sat, Feb 01, 2003 at 02:15:00AM +0100, Erik Moeller wrote:
There are two very good IRC channels on irc.freenode.net:
#php and #mysql
There are lots of knowledgeable people there who are eager to help open source projects, many have already heard of Wikipedia. I learned a lot from just being there for half an hour or so.
Here are some things we need to look into:
- Composite indexes and index use in general. I do not know which indexes
are currently used on the live Wikipedia. However, after the *default* database generation script, there is just a single composite index, namely in the table watchlist. All other indexes are on a single field. If I understand correctly, this means that when we do a query with multiple comparisons, e.g. WHERE foo=bar AND baz=bat, only one index is used.
That is correct, and you could be right about this (I don't know what the current database schema looks exactly like, but last time I was involved this was still ok.) But keep in mind that a composite index also adds a lot of overhead for updates, takes up more room in the index cache, and doesn't really add much speed if the extra columns don't have a high selectivity, i.e., filter out many unwanted records.
- The HANDLER command. HANDLER, which works with both InnoDB and MyISAM,
provides low-level access to tables *without locking*. You get reads, but you don't get consistency, which may cause problems when you're dealing with critical, heavily modified data. But it seems perfect for our archive stuff.
If there are not many updates then this accomplishes next to nothing, except that you have made the software more complicated.
- Upgrading to MySQL4.
Actually I was once in favour of that but Lee convinced me that as long as the developers of MySQL don't think it is ready for production, we shouldn't either. There is only one thing more important than speed and that is stability.
- Caching our article index. I have mentioned this idea before, I think:
We could just keep our entire article index in memory *persistently* (currently the linkcache is filled for each request), which should make link lookup almost instant.
What makes you think it isn't already? I suspect the biggest part of the involved tables is in the cache anyway, so it is probably pretty fast already. So again, please make sure that this is really worth making the software harder to understand. Keeping the software straightforward and understandable and for newcomers that also want to help with programming is a big priority for any open source project.
-- Jan Hidders
.---------------------------------------------------------------------. | Post-doctoral researcher e-mail: jan.hidders@ua.ac.be | | Dept. Math. & Computer Science tel: (+32) 3 820 2427 | | University of Antwerp, room J1.06 fax: (+32) 3 820 2421 | | (UIA) Universiteitsplein 1, B-2610 Wilrijk - Antwerpen (Belgium) | `---------------------------------------------------------------------'