Raymond Wan wrote:
Oh...I see -- thanks for this!
Then I guess there are two combinations: Mediawiki with latin1 MySQL ; Mediawiki with
UTF MySQL.
What are the advantages / disadvantages of either choice?
I *guess* that if someone were to login to mysql directly, and did a SELECT, then the UTF
would look
like gibberish. Likewise when a dump is done of the data. Of course, neither
"problem" affects
Mediawiki's functionality...
Any other pros/cons?
Thanks!
Ray
MediaWiki offers you three character sets for MySQL:
* MySQL 4.1/5.0 binary
* MySQL 4.1/5.0 UTF-8
* MySQL 4.0 backwards-compatible UTF-8
In the three modes MediaWiki is storing utf-8 characters. It all depends
on how MySQL treats them.
In "backwards-compatible UTF-8" mysql thinks it's latin1. The data will
"look wrong" and if you don't provide --default-character-set for
mysqldump 4.1 and newer, it will corrupt the text (it will "helpfully"
transform it to utf-8). This is the only one which works with mysql 4.0,
and it supports the full unicode.
UTF-8 uses MySQL support for UTF-8, which currentyl limits you to the
Basic Multilingual Plane. The data will "look right". The indexes will
be larger.
With binary, it works almost like backwards utf-8, but mysql will treat
it as opaque data and won't mess with it. Representation will be messy.
You have the full unicode.