Fernando Correia wrote:
I'm new to MediaWiki and I'd like some help
about character encoding.
I followed the MediaWiki installation instructions I was able to find on the
Web. They said to use latin-1 encoding on MySQL.
MediaWiki actually works with this configuration. But there is a strange
behavior.
If I open the table in the MySQL administration tool, the accented
characters do not display correctly. They seem to be stored in some other
encoding. Maybe UTF-8?
MediaWiki works exclusively in UTF-8. It is agnostic to the claimed
encoding of the database as long as the database doesn't damage data.
MySQL 4.0 and earlier have very poor character set support, with a
server-wide setting and no Unicode support at all. We simply put data in
and take it back out intact, all as UTF-8, and don't care what MySQL
thinks it is.
MySQL 4.1 and later have support for multiple character sets and
encodings selectable at runtime, including a limited *subset* of UTF-8.
The Unicode and UTF-8 support is incomplete and unsuitable for a general
Unicode-based site such as Wikipedia, so we have not put much effort
into explicitly using it. Until MySQL actually supports Unicode, there's
little incentive for us to do an upgrade with expensive recoding of
table and index encodings.
MediaWiki has an experimental schema file for using the native UTF-8
encoding mode on MySQL 4.1/5.0. Be aware that this is experimental; you
may end up with inconsistent tables if you run the automated updates
(which expect the default, encoding-agnostic schema for MySQL 4.0 and
higher).
If using the explicitly UTF-8 schema, make sure you set $wgDBmysql5 =
true; in LocalSettings.php to have the connection flipped into UTF-8 mode.
-- brion vibber (brion @
pobox.com)