Hi Brion,
On Wed, 16 Nov 2005 13:08:59 -0800 Brion Vibber brion@pobox.com wrote:
Andre Oliveira da Costa wrote:
If I follow it, "missing" content is there, with latin-1 chars (so we're back to the "commas" issue again). Page title is utf-8, but the remaining of the content is latin-1.
Judging by this, it seems the upgrade1_5.php script did convert URLs (and consequently page titles) from latin-1 to utf-8, but some or all of pages content was not converted.
This is normal; set $wgLegacyEncoding for runtime conversion of old text entries.
... wow, that's it? ;-) I thought it would be harder =) Thank God it's simple...
I'll try it ASAP and post here the results. However, a couple of questions:
- shouldn't upgrade1_5.php convert text entries as well? What's the point in "half converting" to utf-8?
- will $wgLegacyEncoding be around on future releases? After all, remaining non-utf8 chars will be on the DB forever since the upgrade script didn't catch them.
- would a "manual" conversion using iconv (as erchache2000 suggested on this thread) after upgrade1_5.php and update.php have been applied successfully convert all DB from latin-1 to utf-8? Could this have any side-effects that could compromise the DB? (eg. if images or any other binary data is stored on the DB, I don't know if iconv is smart enough not to mess with "latin-1 chars" it might find within binary content)
I don't mind having latin-1 chars left behind by the upgrade process as long as $wgLegacyEncoding is not a temporary workaround, so if the answer to question #2 is "yes" you can forget I asked #3... ;-)
Thks again for the help,
Andre