Hi Brion,
On Wed, 16 Nov 2005 13:08:59 -0800
Brion Vibber <brion(a)pobox.com> wrote:
Andre Oliveira da Costa wrote:
If I follow it, "missing" content is
there, with latin-1 chars (so we're back
to the "commas" issue again). Page title is utf-8, but the remaining of the
content is latin-1.
Judging by this, it seems the upgrade1_5.php script did convert URLs
(and consequently page titles) from latin-1 to utf-8, but some or all of
pages content was not converted.
This is normal; set $wgLegacyEncoding for runtime conversion of old text
entries.
... wow, that's it? ;-) I thought it would be harder =) Thank God it's
simple...
I'll try it ASAP and post here the results. However, a couple of questions:
- shouldn't upgrade1_5.php convert text entries as well? What's the point in
"half converting" to utf-8?
- will $wgLegacyEncoding be around on future releases? After all, remaining
non-utf8 chars will be on the DB forever since the upgrade script didn't
catch them.
- would a "manual" conversion using iconv (as erchache2000 suggested on this
thread) after upgrade1_5.php and update.php have been applied successfully
convert all DB from latin-1 to utf-8? Could this have any side-effects that
could compromise the DB? (eg. if images or any other binary data is stored
on the DB, I don't know if iconv is smart enough not to mess with "latin-1
chars" it might find within binary content)
I don't mind having latin-1 chars left behind by the upgrade process as long
as $wgLegacyEncoding is not a temporary workaround, so if the answer to
question #2 is "yes" you can forget I asked #3... ;-)
Thks again for the help,
Andre
--
Andre Oliveira da Costa
(costa(a)tecgraf.puc-rio.br)