Re: [Wikitech-l] Use binary to represent varchar rather than UTF8

10 Jan 2007


      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
howard chen wrote:
...
i would like to know if `wikipedia` is going to use this method in the future?
If MySQL supported UTF-8, we'd be happy to make use of it. Using proper
character sets gives us warm, fuzzy feelings and makes it easier to work
in terminals and other direct-database tools, as well as potentially
making it easier to use built-in database support for case-insensitive
lookups and proper sorting.
BUT... at the moment MySQL only supports a subset of UTF-8 which
corresponds to UCS-2 (limited to the lower 16 bits of Unicode's encoding
space).
Thus characters from a number of scripts encoded outside of that range
cannot be represented without resorting to storing raw UTF-8 in binary
fields. Since we already have data outside that range, we don't plan to
reduce our functionality to shoehorn it into a broken UTF-8 implementation.
(There is an experimental MediaWiki mode for using UTF-8 collation, but
since the functionality in MySQL is incomplete it doesn't fully work.
It's also not properly integrated with the updaters, so an experimental
database in this mode may not properly update on version upgrades.)
We have expressed our interest in full UTF-8 support (or UTF-16 with
proper conversion should do fine!) to MySQL, but as far as I know it's
still not on the roadmap as of 5.2. Maybe some more lobbying is in order. ;)
Since the forseeable future does not include full UTF-8 support in
MySQL, when we upgrade to 5.0 or 5.1 we expect to continue using binary
encoding.
We do though plan to 'formalize' that a bit more, with proper binary
charset/collation labeling on the fields, rather than the ad-hoc method
we've used since 4.0. The experimental 'binary' schema on MediaWiki 1.9
or later can be tested to play with this, but be warned it's even more
experimental than the UTF-8 one at the moment.
- -- brion vibber (brion @ pobox.com)
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFpKQUwRnhpk1wk44RAr84AKCLte+paYICxJqwkZIVYVzEOx9AJwCgk1ao
0oWl2P0ac4rgPUwCMOU5y4c=
=1Msn
-----END PGP SIGNATURE-----

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Use binary to represent varchar rather than UTF8