I set up a copy of
nl.wikipedia.org on my test PC from the public dumps,
and ran the updater to upgrade it to 1.5. This is a medium-sized wiki,
in Latin-1 encoding.
The good news:
* It worked -- the updater ran through to completion without exploding.
* After setting $wgLegacyEncoding = 'windows-1252', it seems to properly
convert article text encoding to UTF-8 on page load.
The bad news:
* The UTF-8 converstion necessary for the other database fields (titles,
usernames, comments etc) hasn't been quite finished yet, so wasn't auto-run.
* The updater ran for a few minutes shy of 10 hours. Most of that time
was spent shuffling cur entries into the old table, where they
eventually become plain old text entries. The pulling of revision data
out of old (by now renamed to text) seemed to take a smaller portion of
the time, but I foolishly didn't time the individual steps.
Most CPU time was spent in I/O wait state in the MySQL server. This
machine has IDE disks purchased for size & cost, not speed, has
relatively little memory (512M), I haven't attempted to optimize the
MySQL configuration for memory usage, and I kept doing things like
installing Debian in VMWare in the foreground... ;)
It probably ought to go faster on the big Wikimedia servers, but I can't
say just how much.
There may be ways to further optimize the conversion process; dropping
some of the indexes first, for instance, might be an overall win if it
makes the importing faster.
Even in the ideal case it'll be kinda slow to run these, but it really
is necessary... at least the schema change should make future changes
less painful.
For the final live updates we'll probably want to do them one at a time,
keeping all other wikis open for editing, and the in-conversion one open
for read-only on a backup.
With the way we've got shared document roots this might require some odd
configuration shuffling to load up either 1.4 or 1.5 code depending on
update state, but I think it should be possible.
-- brion vibber (brion @
pobox.com)