[Mediawiki-l] iso-8859-1 conversion to UTF-8 failed during upgrade from 1.4.0 to 1.5.2

Brion Vibber brion at pobox.com
Wed Nov 16 21:46:40 UTC 2005


Andre Oliveira da Costa wrote:
> Brion Vibber wrote:
>> This is normal; set $wgLegacyEncoding for runtime conversion of old text
>> entries.
> 
> ... wow, that's it? ;-) I thought it would be harder =) Thank God it's
> simple...

;)

> I'll try it ASAP and post here the results. However, a couple of questions:
> 
> - shouldn't upgrade1_5.php convert text entries as well?

No.

> What's the point in "half converting" to utf-8?

Ask that again when _you_ have sixteen million precompressed text
records in Latin-1 and hundreds of people calling for your blood every
minute the site is offline for the upgrade. ;)

> - will $wgLegacyEncoding be around on future releases? After all, remaining
>   non-utf8 chars will be on the DB forever since the upgrade script didn't
>   catch them.

Yes.

> - would a "manual" conversion using iconv (as erchache2000 suggested on this
>   thread) after upgrade1_5.php and update.php have been applied successfully
>   convert all DB from latin-1 to utf-8?

Depends on your database...

> Could this have any side-effects that
>   could compromise the DB? (eg. if images or any other binary data is stored
>   on the DB, I don't know if iconv is smart enough not to mess with "latin-1
>   chars" it might find within binary content)

Yes, that would corrupt any compressed text entries. If you're using
compressed old records you'd have to decompress the whole table before
running such a conversion.

-- brion vibber (brion @ pobox.com)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : http://lists.wikimedia.org/pipermail/mediawiki-l/attachments/20051116/94b954b4/attachment.pgp 


More information about the MediaWiki-l mailing list