Raymond Wan wrote:
Oh...I see -- thanks for this!
Then I guess there are two combinations: Mediawiki with latin1 MySQL ; Mediawiki with UTF MySQL. What are the advantages / disadvantages of either choice?
I *guess* that if someone were to login to mysql directly, and did a SELECT, then the UTF would look like gibberish. Likewise when a dump is done of the data. Of course, neither "problem" affects Mediawiki's functionality...
Any other pros/cons?
Thanks!
Ray
MediaWiki offers you three character sets for MySQL: * MySQL 4.1/5.0 binary * MySQL 4.1/5.0 UTF-8 * MySQL 4.0 backwards-compatible UTF-8
In the three modes MediaWiki is storing utf-8 characters. It all depends on how MySQL treats them.
In "backwards-compatible UTF-8" mysql thinks it's latin1. The data will "look wrong" and if you don't provide --default-character-set for mysqldump 4.1 and newer, it will corrupt the text (it will "helpfully" transform it to utf-8). This is the only one which works with mysql 4.0, and it supports the full unicode.
UTF-8 uses MySQL support for UTF-8, which currentyl limits you to the Basic Multilingual Plane. The data will "look right". The indexes will be larger.
With binary, it works almost like backwards utf-8, but mysql will treat it as opaque data and won't mess with it. Representation will be messy. You have the full unicode.