Ivan Krstic wrote:
Timwi wrote:
I'm not sure I fully understand what you mean by this. Apparently we do update something like the "recentchanges" table on every edit, because some people seem to have thought that it would make for better performance, but I'm pretty much convinced that this assumption is fallacious.
I lay no claims to expertise in this area, but I'm (and I'm sure others are) more than willing to hear better proposals. How would you do it?
Do what? Recent Changes?
Currently, on every edit, among lots of other things, this happens: - a row is inserted into "old" (involves a BLOB and several varchars) - the row in "cur" is updated (also involves a BLOB & several varchars) - another row is inserted into "recentchanges" (several varchars)
This is three operations for something that can be done in one. (Simply insert a row into some generic table, 'articlerevisions' or somesuch.) There is neither a reason for the cur/old split, nor is there a reason for the 'recentchanges' table.
Now, of course the database schema I proposed would indeed require two inserts (rather than just one) for this operation: - one insert for the metadata (edit summary being the only varchar) - one insert for the actual article text (the BLOB and nothing else)
However, this is not noticeably more costly than adding the same data into one table that contains both (which is bad because then reading the metadata when you don't need the article text is inefficient). Furthermore, it means you don't lock an entire table with central importance for quite as long.
Again, please let me stress that I do not claim to be an expert in this. Although I have created database schemata in the past, I have not had the benefit of testing them under load as heavy as that of Wikipedia. However, a major contribution to my experience is my involvement at LiveJournal, which is also a pretty heavily loaded website.
Timwi