Now that my own proposal to re-write the entire code in Perl again was met with great resistance, I propose to at least create an entirely new database structure, and then adapt the current code to it. I have studied the current database structure and see the following rather severe problems with it:
* BLOBs that store article text are combined in the same table as meta-data (e.g. date, username of a change, change summary, minor flag, etc.). This is bad because variable-length fields like BLOBs negatively affect the performance of reading the table. Pages like the watchlist should not have to bother with variable-length data such as article text and would run a lot faster if they could get their data entirely from fixed-length rows.
* Currently, all user properties and preferences, as well as article properties, are columns in a table. Although this is not a problem in terms of DB-reading performance, introducing a new user preference or other enhancements involve adding a new column, and adding a new column becomes extremely database-intensive as the database grows. LiveJournal uses a very clever system that will easily remedy this. One table, 'userproplist', stores the possible user properties (userprops), and another table, 'userprop', stores what user has what userprop with what value. This way all that is needed for adding a new userprop is adding a single row to the 'userproplist' table. The same would analogously apply to articleprops. Once we have that, the user table will hopefully remain very small (in terms of number of columns), so looking up a username (to name just an example) would be ridiculously efficient.
* BLOBs that fulfill the same function (article text) are scattered across two tables (cur and old). This is bad because it means variable-length text has to be moved across tables on every edit. Very slow. Better to give every version/revision of an article (i.e. each item of article text) a permanent ID and use those IDs in the 'cur' and 'old' tables instead. Then have one large table, 'articletext' perhaps, mapping the IDs to their actual BLOBs. This eliminates the need to ever delete a BLOB (except perhaps when actually deleting an article with all its history, which is rare enough). Additionally, there isn't really a need for separate 'cur' and 'old' tables, especially when MemCacheD can take care of most recent versions.
* You are using a 'recentchanges' table which, I presume, gets also updated with every edit. This, I assume is the idea behind it, allows the 'Recent Changes' page to quickly grab the most recent changes without having to find them elsewhere in the DB. Contrary to intuition, this is a bad idea. It is always a better idea to optimise for less DB writes even if it means a few more DB reads, because writes are so much slower. (I am so sure of this because LiveJournal has made these experiences with their "Friends Page": grabbing entries from all the friends' journals all over the place in the DB is faster than updating a "hint table" with every newly created entry.)
In addition to these existing problems, of course there are things the database cannot currently handle, but were planned. While we're changing the DB, we could also add the following functionality to it:
* Store translated website text, so translators don't have to dig through PHP code and submit a file to the mailing list.
* A global table for bidirectional inter-wiki links. People should not have to add the same link to so many articles. In fact, taking this a step further, people should not even have to enter text like '[[fr:démocratie]]' into the article text when it's not part of the article text. There should be drop-downs underneath the big textbox listing languages, and little text boxes next to them for the target article name.
Are you all still convinced that adapting the current code to all these radical changes is easier than rewriting it all from scratch? :-)
Anyway. I'm tired.
I'm going to bed.
Good night.
Timwi