Axel makes a good argument that the new code's simplified layout of the Recent Changes page (see http://www.piclab.com/newwiki/wiki.phtmltitle=Special:Recentchanges ) really is an important loss of functionality. Since his comment, I added an article history link, but I think his comment still applies in general. I think it is of critical importance that this feature be optimized both for function and for speed. I don't want to put the old code back because it's a dog, but if I can find a way to implement the needed features in a faster way, I'm all for it.
First, I'd like to solicit an opinion from Jan or other database gurus about what might make the speed better. Currently, it is driven by a single SELECT from the current article table, which is sorted by reverse timestamp (on which there's an index), and with a LIMIT. The old (N changes) feature also required accessing the old versions table and an expensive GROUP BY, which I'd like to avoid if possible.
One way I might do that is to create a new "changelog" table, updated on each page change. This would hold the information needed for the RC page which would avoid having to hit the article table or the old versions table at all. It would use basically the same query (by reverse timestamp). It might even be able to do the GROUP BY faster, but if not, it would still have the advantage that multiple changes to the same page would all show up in the list at their appropriate position, which might actually be better, because that would also show the person changing it and other stats which are now buried in the (N changes).
Jan, would the increase non-atomicity of the article save process be a problem? MySQL doesn't have transactions, so it's possible that either an aritlce might be saved while recording that fact in the changelog fails, or vice versa. I can't think of any real negative consequences of that (each article's history list would still be accurate, because those go straight to the article tables).
Any other ideas? 0
On Thu, May 30, 2002 at 03:24:14PM -0700, lcrocker@nupedia.com wrote:
Axel makes a good argument that the new code's simplified layout of the Recent Changes page (see http://www.piclab.com/newwiki/wiki.phtmltitle=Special:Recentchanges ) really is an important loss of functionality. Since his comment, I added an article history link, but I think his comment still applies in general. I think it is of critical importance that this feature be optimized both for function and for speed. I don't want to put the old code back because it's a dog, but if I can find a way to implement the needed features in a faster way, I'm all for it.
I'd really like to help but I'm very busy at the moment and next week I'll be in Madison Wisc. attending a conference. So I can only make a few quick remarks here.
First I would check if the old SQL code was really that bad. Just put it side by side with the new SQL in your code and time it. A lot depends here on how MySQL optimizes this query and uses the indices, but I am now at work and cannot check with EXPLAIN how MySQL actually deals with it.
First, I'd like to solicit an opinion from Jan or other database gurus about what might make the speed better. Currently, it is driven by a single SELECT from the current article table, which is sorted by reverse timestamp (on which there's an index), and with a LIMIT. The old (N changes) feature also required accessing the old versions table and an expensive GROUP BY, which I'd like to avoid if possible.
Grouping on an indexed column is not always expensive. Again, it depends on how MySQL optimizes the query.
One way I might do that is to create a new "changelog" table, updated on each page change.
Also here it depends upon the details of how MySQL stores its records with fields such as those of type text. It could very well be that it only needs one disk read per record to retrieve the necessary information from the 'old' table, in which case the extra table will not make a difference. (In fact if MySQL is really smart it could do it only with the indices.) Also here I would suggest to simply experiment a litte. It's quite easy to create once the redundant table and then try a few queries and time them and compare them to the original ones.
Jan, would the increase non-atomicity of the article save process be a problem? MySQL doesn't have transactions, so it's possible that either an aritlce might be saved while recording that fact in the changelog fails, or vice versa.
That chance is very small if you put the SQL statements very close to eachother. We are already non-atomic as it is because when a page is changed we do a delete and two inserts, and then I'm not even talking about the work that is done for the linked and unlinked tables. As long as the updates of cur and old go right you can always regenerate the changelog with a simple SELECT over old.
-- Jan Hidders
lcrocker@nupedia.com wrote:
One way I might do that is to create a new "changelog" table, updated on each page change. This would hold the information needed for the
Read the source code of the old UseModWiki software. It does all of this with plain files and Perl, but you could easily apply the same method to PHP and MySQL.
wikipedia-l@lists.wikimedia.org