I'm new to MediaWiki and I'd like some help about character encoding.
I followed the MediaWiki installation instructions I was able to find on the Web. They said to use latin-1 encoding on MySQL.
MediaWiki actually works with this configuration. But there is a strange behavior.
If I open the table in the MySQL administration tool, the accented characters do not display correctly. They seem to be stored in some other encoding. Maybe UTF-8?
If I insert data on the table using another program, the accented characters look good in the MySQL administration tool, but don't display correctly in MediaWiki.
It seems that although the database is using latin-1 encoding, MediaWiki is internally using some other.
I'm concerned about two things: 1. That MediaWiki encoding will conflict with MySQL encoding and some corruption may occur. 2. That I will not be able to insert data on the tables using programs that I write. I need to do this to fill the user table.
I will appreciate any help about this.
Thanks!
Fernando Correia wrote:
I'm new to MediaWiki and I'd like some help about character encoding.
I followed the MediaWiki installation instructions I was able to find on the Web. They said to use latin-1 encoding on MySQL.
MediaWiki actually works with this configuration. But there is a strange behavior.
If I open the table in the MySQL administration tool, the accented characters do not display correctly. They seem to be stored in some other encoding. Maybe UTF-8?
MediaWiki works exclusively in UTF-8. It is agnostic to the claimed encoding of the database as long as the database doesn't damage data.
MySQL 4.0 and earlier have very poor character set support, with a server-wide setting and no Unicode support at all. We simply put data in and take it back out intact, all as UTF-8, and don't care what MySQL thinks it is.
MySQL 4.1 and later have support for multiple character sets and encodings selectable at runtime, including a limited *subset* of UTF-8.
The Unicode and UTF-8 support is incomplete and unsuitable for a general Unicode-based site such as Wikipedia, so we have not put much effort into explicitly using it. Until MySQL actually supports Unicode, there's little incentive for us to do an upgrade with expensive recoding of table and index encodings.
MediaWiki has an experimental schema file for using the native UTF-8 encoding mode on MySQL 4.1/5.0. Be aware that this is experimental; you may end up with inconsistent tables if you run the automated updates (which expect the default, encoding-agnostic schema for MySQL 4.0 and higher).
If using the explicitly UTF-8 schema, make sure you set $wgDBmysql5 = true; in LocalSettings.php to have the connection flipped into UTF-8 mode.
-- brion vibber (brion @ pobox.com)
mediawiki-l@lists.wikimedia.org