On 1/10/07, howard chen howachen@gmail.com wrote:
besides fully UTF8 support, using binary will have the advantages of saving space for indexes, so will it also alter the decision?
It is possible that when MySQL gets unicode support beyond UCS-16 it will come in the form of UTF-8, which is what we currently use packed into those binary fields.
This isn't that all unlikely. Supporting non-BMP characters without dealing with variable length characters requires UCS-32, and I think that even MySQL users would balk at another needless 2x increase in the size of their ASCII data. Since the effort required to work with UTF-16 (i.e. the two byte variable length encoding) is similar to UTF-8, it may make sense to go all the way to UTF-8 and get the space savings for ASCII.
It's also possible that by the time MySQL has support for non-BMP characters, it may have support for functional indexes, allowing the index to operate on a different datatype than the row... (although a straightforward type conversion would kill one of the primary advantages of using a real string type: collation which isn't total nonsense)
(and please trim your replies)