I just updated the CVS with the caching mechanism I proposed on the tech talk page.
It is deactivated by default, because you need to alter your database prior to use.
To activate: 1. In mysql, add the field cur_cache: ALTER TABLE cur ADD cur_cache MEDIUMTEXT 2. Edit wikiText.php and remove the # from the last coding line, so that it reads $useCachedPages = true ;
The one thing I didn't implement is the regular "forced update" of the cache. This could be done by counting the views and flushing the cache after, say, 50 views. It could also be time-based, so it is refereshed if the cache is more than two weeks old. (just example values)
Note that pages with variables (like the Main Page with {{NUMBEROFARTICLES}}) are *not* cached.
I noticed a slight performance improvement at my machine (~15%) when reading a cached page, and no noticable delay on saving the cache.
"The Need For Speed" continues on this channel ;)
Magnus
From: "Magnus Manske" Magnus.Manske@epost.de
To activate:
- In mysql, add the field cur_cache:
ALTER TABLE cur ADD cur_cache MEDIUMTEXT 2. Edit wikiText.php and remove the # from the last coding line, so that
it
reads $useCachedPages = true ;
I didn't see the cur_cache field in wikipedia.sql. Is that on purpose?
-- Jan Hidders
Dear fellow programmers,
I saw that the SQL code for the Recent Changes page is rather inefficient and causes a lot of database access, so I decided to improve this. However, I can only do this properly if the timestamp field in the tables is split in a day and a time field. In order to minimze the inconvenience for the other programmers I suggest to split my work in three steps as follows.
Step 1: 1. I will add an SQL script file 'addNewTimeStamps.sql' that changes the table definitions and fills the new columns with the values computed from the old time stamps. Jimbo needs to run this to convert the existing database to the new format. 2. I will make a new dump for 'Wikipedia.sql'. 3. I will adapt every write action in the script to the tables 'old' and 'cur' such that the the new columns are also written. 4. Test and then submit to cvs.
Step 2: 1. start programming on the Recent Changes page. 2 test and submit.
Step 3: 1. Write an SQL script 'removeOldTimeStamps.sql' that removes the old time stamp columns. This should also be run by Jimbo when he installs the patch that contains this step. 2. Adapt the scripts that read and write the cur table and old table, so they don't use the old columns anymore. 3. Test and submit.
If anyone has objections, a better plan, or would like me to wait with this until a certain piece of work has been finished, please let me know.
Kind regards,
-- Jan Hidders
Jan Hidders wrote:
Dear fellow programmers,
I saw that the SQL code for the Recent Changes page is rather inefficient and causes a lot of database access, so I decided to improve this. However, I can only do this properly if the timestamp field in the tables is split in a day and a time field.
Can I ask how exactly that would help? I'm not much of a database guru, so the answer isn't obvious to me and I'm a bit curious.
-- brion vibber (brion @ pobox.com)
From: "Brion L. VIBBER" brion@pobox.com
Jan Hidders wrote:
Dear fellow programmers,
I saw that the SQL code for the Recent Changes page is rather inefficient and causes a lot of database access, so I decided to improve this.
However,
I can only do this properly if the timestamp field in the tables is split
in
a day and a time field.
Can I ask how exactly that would help? I'm not much of a database guru, so the answer isn't obvious to me and I'm a bit curious.
It allows me to do a GROUP BY on the day. That way I can take a left outer join between the cur table and the old table and group on the combination of cur_day and cur_title. This allows me to get all the information I need for the page in one SQL statement.
-- Jan Hidders
(moving thread to wikitech-l)
Jan Hidders wrote:
From: "Brion L. VIBBER" brion@pobox.com
Jan Hidders wrote:
Dear fellow programmers,
I saw that the SQL code for the Recent Changes page is rather inefficient and causes a lot of database access, so I decided to improve this.
However,
I can only do this properly if the timestamp field in the tables is split
in
a day and a time field.
Can I ask how exactly that would help? I'm not much of a database guru, so the answer isn't obvious to me and I'm a bit curious.
It allows me to do a GROUP BY on the day. That way I can take a left outer join between the cur table and the old table and group on the combination of cur_day and cur_title. This allows me to get all the information I need for the page in one SQL statement.
Right. Hmm, can you use TO_DAYS(cur_timestamp) or some such? Or is that just going to cause problems?
If it's more efficient to use a split timestamp, I can't come up with any objection. But, it's Magnus' baby. :)
-- brion vibber (brion @ pobox.com)
wikipedia-l@lists.wikimedia.org