On Tue, 27 May 2003, Lee Daniel Crocker wrote:
I confess ignorance here. Are there really languages for which the simplest canonical representation in Unicode requires combining forms?
Off the top of my head, one Aleutian language (Unangam Tunuu) uses x-with-circumflex; Guarani apparently uses g-with-tilde. Tone marks for Chinese Zhuyin phoenetic script are combining characters; I think the Indian scripts are pretty dependant on this kind of thing as well.
Precombined characters are theoretically only included for round-trip conversion with legacy character sets, so they're not really making new ones for orthographies that are just getting started in the wonderful world of character encoding.
If so, then I remove the restriction, but we must then specify a specific canonical representation for titles in each language, as you suggest; perhaps something like a Stringprep profile would be needed.
They've thought of that already too, it seems. :) See Unicode Standard Annex #15, "Unicode normalization forms": http://www.unicode.org/unicode/reports/tr15/
-- brion vibber (brion @ pobox.com)