So, Wikitravel uses the default ISO-8859-1 encoding, and we're starting to need UTF-8 characters.
I think the process for converting might be as follows:
1. Shut down the site. 2. Backup the database with a data dump. 3. iconv the dump file to UTF-8. 4. Twiddle with mysql till it knows to use UTF-8. 5. Twiddle with PHP's php.ini till it knows to use UTF-8. 6. Change the encoding in LocalSettings.php to use UTF-8. 7. Delete everything in the database. 8. Import the data dump back in. 9. Turn the site back on. 10. Hope for the best.
Does this sound about right? Have any other MediaWiki installations switched encodings midstream like this?
~ESP
"EP" == Evan Prodromou evan@wikitravel.org writes:
EP> So, Wikitravel uses the default ISO-8859-1 encoding, and we're EP> starting to need UTF-8 characters.
EP> I think the process for converting might be as follows:
EP> 3. iconv the dump file to UTF-8. [...] EP> 8. Import the data dump back in.
OK, so, this won't work. linkscc has binary data in it that mucks up the dump file.
I think maybe skipping the linkscc table, and rebuilding it afterwards, might work.
~ESP
On Dec 6, 2003, at 14:05, Evan Prodromou wrote:
EP> 3. iconv the dump file to UTF-8. [...] EP> 8. Import the data dump back in.
OK, so, this won't work. linkscc has binary data in it that mucks up the dump file.
I think maybe skipping the linkscc table, and rebuilding it afterwards, might work.
The only tables that should contain binary data at present are linkscc (gzipped data) and math (a binary hash value). Both of these tables' contents are volatile: you can just clear them out, and they will be regenerated when needed.
-- brion vibber (brion @ pobox.com)
"BV" == Brion Vibber brion@pobox.com writes:
BV> The only tables that should contain binary data at present are BV> linkscc (gzipped data) and math (a binary hash value). Both of BV> these tables' contents are volatile: you can just clear them BV> out, and they will be regenerated when needed.
So, I think what my new strategy is is this:
1. Shut down the site. 2. Backup these tables with a data dump:
* archive * cur * image * interwiki * ipblocks * old * oldimage * user * user_newtalk * watchlist
3. iconv the dump file to UTF-8. 4. Twiddle with mysql till it knows to use UTF-8. 5. Twiddle with PHP's php.ini till it knows to use UTF-8. 6. Change the encoding in LocalSettings.php to use UTF-8. 8. Reinstall MediaWiki, wiping the DB. 9. Import the data dump back in. 10. Rebuild links and RC tables with rebuildall.php. 11. Turn the site back on. 12. Hope for the best.
I guess at 1 I could just lock the database, rebuild in a new DB, and at 11 change localsettings so it points to the new db.
That might work.
~ESP
mediawiki-l@lists.wikimedia.org