Hi Brion,
I was finally able to dedicate some time to this. Below I post more detailed
info.
On Wed, 09 Nov 2005 14:23:33 -0800
Brion Vibber <brion(a)pobox.com> wrote:
Andre Oliveira da Costa wrote:
On Wed, 09 Nov 2005 13:43:10 -0800
Brion Vibber <brion(a)pobox.com> wrote:
Andre Oliveira da Costa wrote:
Ok, got it. Any chance this iso-8859-1 -->
utf-8 problem will get fixed on
future releases?
Well, it worked fine for us, so I'm not sure what there is to
fix?
Mmmh... "Houston, we got a problem" ;-)
Not sure why it didn't work for me, then. Does my description of the problem
on bug #3898 [
http://bugzilla.wikimedia.org/show_bug.cgi?id=3898] give you
any clue about what could have failed on my upgrade (and, judging by some of
the replies, on others' as well)? If not, what additional info can I provide
to help you track down the problem?
* Operating system and version
FC4, all latets updates applied, kernel 2.6.14-1.1637_FC4
* PHP version
PHP 5.0.4
* PHP configuration
Should I send /etc/php.ini directly to your email address? It's quite
big to be posted here...
* Which PHP modules are installed
php-pear-5.0.4-10.5
php-pgsql-5.0.4-10.5
php-jpgraph-1.19-1.2.fc4.rf
php-snmp-5.0.4-10.5
php-mbstring-5.0.4-10.5
php-mysql-5.0.4-10.5
(I don't use all of them, some -- like snmp -- have been installed for
testing purposes; actually, I don't really _use_ PHP at home, apache
is off by default)
* MySQL version
MySQL 4.1.14
* MySQL configuration
~ cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1
[mysql.server]
user=mysql
basedir=/var/lib
[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
* LocalSettings.php
I don't recall doing any tweaking to it, but I can forward it to you if you need it.
* Any other modifications or settings on MediaWiki
I haven't applied any modification on this test setup. 1.4.0 installation has
some modifications, but they are related to access rights, not content.
Still, if this is relevant, I can provide them to you.
* Exact, step-by-step procedure you used for running
the upgrade
- extract content from mediawiki-1.5.2.tar.gz at /var/www/html
- make symlink from /var/www/html/mediawiki-1.5.2 to /var/www/html/wiki
PS: [wiki] = /var/www/html/wiki
- create 'wikidb' DB on MySQL
- create 'wikiuser' with full rights on wikidb.*
- import MediaWiki 1.4.2 DB dump (with mysql -u wikiuser -h localhost -p
wikidb < wikidb-dump.sql)
- copy [wiki]/AdminSettings.sample to [wiki]/AdminSettings.php, configure
it with proper MySQL root user/password
- copy LocalSettings.php from current (1.4.0) installation to [wiki] dir
- run 'php -f maintenance/upgrade1_5.php' from dir [wiki] (I captured its
output in case it helps)
- run 'php -f maintenance/update.php' (captured output of this one as well)
- chown -R apache:apache [wiki]
After this, [host]/wiki is online, but content doesn't display right
-- latin-1 chars are replaced by commas.
Investigating more thoroughly the problem I guess I now have a better
understanding of what's happening: commas are just the way of Firefox telling
me it found chars incompatible with the page encoding (sorry for the false
alarm, should have realized this). In this case, latin-1 chars were left
behind during the translation, and page encoding is utf-8. If I force the
browser to use latin-1 encoding, page displays fine -- well, almost, some
content seems to be missing.
Eg. this page:
http://shadow/wiki/index.php?title=Padr%F5es_de_Programa%E7%E3o&action=…
which should point to a page titled "Padrões de Programação" appears as
missing (i.e. link appears as ...&action=edit). If I go to the special "dead
end pages" page, I see this link there:
http://shadow/wiki/index.php?title=Padr%C3%B5es_de_Programa%C3%A7%C3%A3o
If I follow it, "missing" content is there, with latin-1 chars (so we're
back
to the "commas" issue again). Page title is utf-8, but the remaining of the
content is latin-1.
Judging by this, it seems the upgrade1_5.php script did convert URLs
(and consequently page titles) from latin-1 to utf-8, but some or all of
pages content was not converted.
I hope this helps you guys understand what happened -- and fix the upgrade
script so that I can migrate to 1.5.2 =)
* If possible, sample data files
Well, I would be happy to provide configuration files for MySQL and
PHP, and also output from conversion scripts. Just let me know if you
guys would like to take a look at them, and where should I send them to. I
can also provide some content if it will help.
Chars being replaced by commas is not something
I've ever seen.
You're 100% right, it was a silly oversight. My bad, sorry for the confusion.
Still, there is indeed some problem with the latin-1 --> utf-8 conversion
process...
Any ideas? Anything else I could provide?
Best,
Andre
--
Andre Oliveira da Costa
(costa(a)tecgraf.puc-rio.br)