[Mediawiki-l] Mysql, UTF-8: How is it supposed to work?
brion at pobox.com
Fri Feb 17 12:20:25 UTC 2006
Dorthe Luebbert wrote:
> I wonder how the UTF-8-Support in Mediawiki works and what valid
> combinations of database charsets and output charsets are.
> As far as I understand in version 1.5 the default character set has
> changed to UTF-8.
The default has been UTF-8 since a long long time ago. In some older versions
(possibly as late as 1.3), a handful of European languages had to be installed
in Latin-1, English defaulted to UTF-8 but could optionally be Latin-1, and
every other languages was UTF-8.
As of 1.4, UTF-8 was the default for all languages.
As of 1.5, Latin-1 is no longer supported.
> Therefore I suppose Mediawiki stores HTML-entities in
> the database per default (because Mysql 4.0 does not fully support
> UTF-8). Right?
MySQL through 4.0 doesn't have native support for Unicode, so we just treat the
fields as binary and store UTF-8 data in them directly.
MySQL 4.1 and later have somewhat fancier character set options including some
broken Unicode support. By default, MediaWiki continues to treat it as on 4.0
and earlier; data is chucked in and retrieved as raw UTF-8 without worrying
about the server's character set configuration.
Generally this works fine, though sometimes you'll get surprises if you let
MySQL do implicit character conversion based on what it _thinks_ your tables
In current 1.5 releases you may optionally have the tables created with the
UTF-8 character set explicitly set, and UTF-8 explicitly set on the db connection.
This may or may not be helpful for some people for some reason; but mostly it will:
* Make indexes larger (3 bytes per character)
* Cause failures if you use characters outside the BOM in page titles,
-- brion vibber (brion @ pobox.com)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 249 bytes
Desc: OpenPGP digital signature
Url : http://lists.wikimedia.org/pipermail/mediawiki-l/attachments/20060217/69863680/attachment.pgp
More information about the MediaWiki-l