Tim Starling wrote:
Brion Vibber wrote:
* The updater ran for a few minutes shy of 10
hours. Most of that time
was spent shuffling cur entries into the old table, where they
eventually become plain old text entries. The pulling of revision data
out of old (by now renamed to text) seemed to take a smaller portion of
the time, but I foolishly didn't time the individual steps.
If this proves to be a problem, we could write an external storage class
for retrieving text from the cur table, similar to the stub objects for
concatenated gzip compression. That would at least reduce the size of
the required disk writes. The whole cur row may still need to be loaded
in order to construct the page table, such is the nature of MySQL. The
text could be moved to the text table while the wiki is read/write.
I went ahead and tried this last night; it reduced the cur-to-old copy
to about 8 minutes, though the old-to-revision extraction still took
about 5 hours.
This is now checked in, and can be enabled by setting
$wgLegacySchemaConversion = true; before running update.php. There is
not yet a final cur->text migration script but it would be fairly easy
to do.
As for the revision extraction, I suppose it might be acceptable to
extract the current revision data only at first, then fill in the old
revisions during read/write... though that might be confusing and odd,
and page move / deletion / undeletion operations could break things
unless disabled.
There may also be ways to make the copy more efficient; if InnoDB is
trying to bunch up a million rows in one transaction it may be less
efficient than it should be, particularly if the buffers are smallish.
-- brion vibber (brion @
pobox.com)