[Mediawiki-l] Mysql, UTF-8: How is it supposed to work?

Dorthe Luebbert luebbert at globalpark.de
Fri Feb 17 12:01:40 UTC 2006


I wonder how the UTF-8-Support in Mediawiki works and what valid 
combinations of database charsets and output charsets are.

As far as I understand in version 1.5 the default character set has 
changed to UTF-8. Therefore I suppose Mediawiki stores HTML-entities in 
the database per default (because Mysql 4.0 does not fully support 
UTF-8). Right?

Yesterday we tried to upgrade a 1.5x-Media-Wiki to Mysql 4.1 (the server 
was upgraded and the wiki was unfortunately affected). We found a 
character set mess within the latin1-database, which we cleaned up by 
find/replace in the dump file. Now we have UTF8 content in the database, 
the character set for the tables is set to UTF-8 and utf8 is used as 
charset in the output. We also enabled the Mysql5-experimental flag. 
Some parts of the page work all right, some do not (e.g. page titles), 
this was mentioned in the changelog file as todo.

Now it's broken and I would like to which combination is supposed to 
work. Is this one a possible combination?
Database: Mysql 4.1
PHP: 5.1
Database-charset: Latin1, all content in the database is latin1
Output-charset: UTF-8

Thanks for any hint.



More information about the MediaWiki-l mailing list