Ian Smith wrote:
- Due to the limitations of MySQL's Unicode support, but default we
continue to treat MySQL fields as binary and store pure UTF-8 Unicode in them, although MySQL may have them listed as Latin-1 depending on your server's defaults.
Surely this is a bug? If MW wants binary fields, then surely it should explicitly create them as binary, instead of leaving it up to some random server default?
In MySQL 4.0, there were no table or column character sets, there was only a server character set. You could specify a "binary" modifier on columns, altering the collation, which we duly did. Our 4.0-compatible schema thus uses binary collations for varchar columns, but does not specify a character set, since there was no way to do that in MySQL 4.0.
As of MediaWiki 1.9, there is an installer option to select a "MySQL 5 binary" schema, which does specify a binary character set.
-- Tim Starling