New subject: Proposed table restructuring

10 Dec 2003


      Apparently the archives for the list ate everything but the GPG 
signature, so here it is again as plaintext:
On Dec 10, 2003, at 16:33, Brion Vibber wrote:
...
There's been talk for some time of reworking the cur/old split into a 
table of page information and a table of edit revision information. 
Attached is a proposed (untested!) schema for page and revisions 
tables and the transformation from the current ones.
The main expected benefit is to simplify code that has to deal with 
both old and current revisions: recentchanges, contribs, history, and 
of course saving of pages, renaming, etc. Instead of jumping through 
hoops to deal with the same kind of data in cur and old, all 
same-purpose data should stay in the same place.
This will simplify renames by avoiding the need to touch the revisions 
table; only page (and of course links, searchindex...). Saves won't 
have to duplicate information from cur to old; just insert the new row 
to revisions and update the current revision pointer in page.
Additionally, the current revisions of pages will have an ID number 
consistent with the old revisions, so we can create revision-specific 
citation URLs. [Though the deletion system will need to be munged to 
make it possible to preserve these across delete/undelete cycles.]
A few notes:
I think we can get rid of inverse_timestamp. It was needed on MySQL 
3.x because it didn't support reverse sorting on indexes and was dog 
slow; but MySQL 4.0 has been 'in production' for a while now and if 
you're running a large wiki on 3.x it's your own fault. :)
The schema includes a place for a title sort key in page. With 
appropriate munging (see docs somewhere on unicode.org) this can be 
used to get language-appropriate sorting of titles in display. That 
would be a good thing.
There's also a rev_prior key to link to the previous revision. The old 
table had this back in phase 2 but it wasn't generally used and got 
dropped; however this is needed to make it feasible to put diff links 
into user contribs and other places where we don't necessarily have 
the next revision sitting in a buffer already.
I've dropped the unused old_flags (though maybe we should have another 
flag field to indicate future compression options -- gzip and diff 
have been suggested). Also cur_is_new is dropped in favor of 
rev_prior; you could key on rev_prior=0 to find new versions.
Comments? Ideas? Complaints?
-- brion vibber (brion @ pobox.com)
<patch-revisions.sql>
-- Change cur/old into page/revisions
-- page_sortkey will contain a munged version of the title for
---             language-specific sorting
-- page_current is a foreign key on revisions.rev_id
DROP TABLE IF EXISTS page;
CREATE TABLE page (
   page_id int(8) unsigned NOT NULL auto_increment,
   page_namespace tinyint(2) unsigned NOT NULL default '0',
   page_title varchar(255) binary NOT NULL default '',
   page_sortkey varchar(255) binary NOT NULL default '',
   page_restrictions tinyblob NOT NULL default '',
   page_counter bigint(20) unsigned NOT NULL default '0',
   page_is_redirect tinyint(1) unsigned NOT NULL default '0',
   page_random real unsigned NOT NULL,
   page_touched char(14) binary NOT NULL default '',
   page_current int(8) unsigned NOT NULL default 0,
   UNIQUE KEY page_id (page_id),
   UNIQUE KEY namespace_title (page_namespace,page_title),
   KEY (page_random),
);
-- rev_page is a foreign key to page.page_id
-- rev_prior will key to another rev_id for the previous revision, to 
simplify
--           making diff links etc
DROP TABLE IF EXISTS revisions;
CREATE TABLE revisions (
   rev_id int(8) unsigned NOT NULL auto_increment,
   rev_page int(8) unsigned NOT NULL default 0,
   rev_prior int(8) unsigned NOT NULL default 0,
   rev_text mediumtext NOT NULL default '',
   rev_comment tinyblob NOT NULL default '',
   rev_user int(5) unsigned NOT NULL default '0',
   rev_user_text varchar(255) binary NOT NULL,
   rev_timestamp char(14) binary NOT NULL default '',
   rev_minor_edit tinyint(1) NOT NULL default '0',
   rev_current tinyint(1) NOT NULL default 0,
   UNIQUE KEY rev_id (rev_id),
   ADD INDEX (rev_timestamp),
   ADD INDEX page_timestamp (rev_page,rev_timestamp),
   ADD INDEX user_timestamp (rev_user,rev_timestamp),
   ADD INDEX usertext_timestamp (rev_user_text,rev_timestamp);
);
-- Import old revisisons from old to revisions
INSERT INTO revisions
    (rev_id,rev_page,rev_text,rev_comment,rev_user,rev_user_text,
    rev_timestamp,rev_minor_edit,rev_flags)
   SELECT
    old_id,cur_id,old_text,old_comment,old_user,old_user_text,
    old_timestamp,old_minor_edit,old_flags)
    FROM old,cur
    WHERE old_namespace=cur_namespace and old_title=cur_title;
-- Import cur revisions from cur to revisions
-- This will create new rev_id revisions keys.
INSERT INTO revisions
    (rev_page,rev_text,rev_comment,rev_user,rev_user_text,
    rev_timestamp,rev_minor_edit,rev_cur_edit)
   SELECT
    cur_id,cur_text,cur_comment,cur_user,cur_user_text,
    cur_timestamp,cur_minor_edit
    FROM cur;
-- Import page definitions from cur to page
INSERT INTO page
    (page_id,page_namespace,page_title,page_restrictions,
    page_counter,page_is_redirect,page_random,page_touched,
    page_current)
   SELECT
    cur_id,cur_namespace,cur_title,cur_restrictions,
    cur_counter,cur_is_redirect,cur_random,cur_touched,
    rev_id
    FROM cur,revisions
    WHERE cur_id=rev_page and rev_is_current;
-- rev_prior and page_sortkey need to be filled out by a php script?

Re: Proposed table restructuring