Just a note of warning for those of you using MySQL 4.1: changes in the new charset options may result in mysqldump outputting bogus data into backups which can't be restored without data loss.*
This may affect some Unicode text, and certainly can irretrievably corrupt compressed old revision text (using $wgCompressRevisions option). If you're using MySQL 4.1, you should probably examine and test your backup dumps to make sure they can be restored and used successfully.
Passing an option like --default-character-set=latin1 may stop mysqldump from trying to 'convert' (and thus corrupt) your data. (If your server is not set to the defaults, this may or may not be the correct value for you.) In the future hopefully we'll be able to play nicer with the new character set settings, but for now MediaWiki follows prior practice for older versions of MySQL where there was (and remains) no ability to correctly indicate the charset used in a particular database, table, or field.
* Specifically, a default "latin-1" to UTF-8 conversion silently corrupts all bytes with the values 0x81, 0x8d, 0x8f, 0x90, or 0x9d by turning them into literal question marks. The question marks cannot be returned to their original byte values when the data is re-imported.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org