Proposed table restructuring - Wikitech-l

10 Dec 2003


      There's been talk for some time of reworking the cur/old split into a 
table of page information and a table of edit revision information. 
Attached is a proposed (untested!) schema for page and revisions tables 
and the transformation from the current ones.
The main expected benefit is to simplify code that has to deal with 
both old and current revisions: recentchanges, contribs, history, and 
of course saving of pages, renaming, etc. Instead of jumping through 
hoops to deal with the same kind of data in cur and old, all 
same-purpose data should stay in the same place.
This will simplify renames by avoiding the need to touch the revisions 
table; only page (and of course links, searchindex...). Saves won't 
have to duplicate information from cur to old; just insert the new row 
to revisions and update the current revision pointer in page.
Additionally, the current revisions of pages will have an ID number 
consistent with the old revisions, so we can create revision-specific 
citation URLs. [Though the deletion system will need to be munged to 
make it possible to preserve these across delete/undelete cycles.]
A few notes:
I think we can get rid of inverse_timestamp. It was needed on MySQL 3.x 
because it didn't support reverse sorting on indexes and was dog slow; 
but MySQL 4.0 has been 'in production' for a while now and if you're 
running a large wiki on 3.x it's your own fault. :)
The schema includes a place for a title sort key in page. With 
appropriate munging (see docs somewhere on unicode.org) this can be 
used to get language-appropriate sorting of titles in display. That 
would be a good thing.
There's also a rev_prior key to link to the previous revision. The old 
table had this back in phase 2 but it wasn't generally used and got 
dropped; however this is needed to make it feasible to put diff links 
into user contribs and other places where we don't necessarily have the 
next revision sitting in a buffer already.
I've dropped the unused old_flags (though maybe we should have another 
flag field to indicate future compression options -- gzip and diff have 
been suggested). Also cur_is_new is dropped in favor of rev_prior; you 
could key on rev_prior=0 to find new versions.
Comments? Ideas? Complaints?
-- brion vibber (brion @ pobox.com)