I don't want to defend MySQL development decisions- in fact PHP made some similarly bad ones, but it would be unfair to judge them too harsly with the "power of hindsight" [0]- but... /pedantic on

On Thu, Oct 14, 2021 at 7:37 PM Roy Smith <roy@panix.com> wrote:

What part of "universal" did they not understand?

... several years ago, during the end of the century/start of a new one, no one used UTF-8 [1] and PHP didn't even support multi-byte strings. The original spec for UTF-8 called for up to 6 bytes[2]. The BMP, however (3 bytes) contained characters for most modern languages [3], which was a waste of space and performance because at the time, MySQL worked much faster with fixed-width columns, which would be a waste of space (double!). My guess is that someone said "this is probably good enough", and would it be too outrageous to think that we may not need as many extra characters as stars in our Galaxy, when less than 65K were practically needed?

3 things changed after that:

* Unicode limited UTF-8 to encoding for 21 bits in 2003 [4], requiring only 4 bytes- only one more than on MySQL's utf8

* Apple wanted to sell iPhones in Japan, so they were added to unicode in 2010, and its subsequent popularity

* MySQL/InnoDB has been highly optimized for the fast handling of variable-length strings

However, you cannot just arbitrarily break backwards compatibility and rename the meaning of configuration- specially with storage software that has been continuously supporting incremental upgrades as long as I can remember. You can just support the new standard and encourage its usage, make it the default, etc.

This is a bit offtopic here (feel free to PM to continue the conversation), and just to be clear, I am _not fully justifying the decisions_, just giving historical context, but I want to end with some relevant lessons to the list:

* It is very difficult to build future-proof applications- PHP, MySQL, Mediawiki, they have a long history and we should be gentle when we judge them from the future. My work, involving backups, makes sometimes supporting storage of stuff for over 5 years (unchanged) challenging, because encryption algorithms are found to be weak, or end up being unsupported/unavailable in just 2 releases of the operating system!

* Standards also change, they are not as "universal" as we may want to believe (there have been 32 extra unicode versions since 1991). I expect new collations to be needed in the future that are currently not implemented, too.

* It is ok to make "mistakes", as long as we learn from them and improve upon them :-)

Sorry for the text block.

[0] <url:https://powerlisting.fandom.com/wiki/Hindsight>

[1] <url:https://commons.wikimedia.org/wiki/File:Utf8webgrowth.svg>

[2] <url:https://www.rfc-editor.org/rfc/rfc2279>

[3] <url:https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane>

[4] <url:https://www.rfc-editor.org/rfc/rfc3629>