Re: [Wikitech-l] Title characters

28 May 2003


      On Tue, 27 May 2003, Lee Daniel Crocker wrote:
...
I confess ignorance here. Are there really languages for which the
simplest canonical representation in Unicode requires combining forms?
Off the top of my head, one Aleutian language (Unangam Tunuu) uses
x-with-circumflex; Guarani apparently uses g-with-tilde. Tone marks for
Chinese Zhuyin phoenetic script are combining characters; I think the
Indian scripts are pretty dependant on this kind of thing as well.
Precombined characters are theoretically only included for round-trip
conversion with legacy character sets, so they're not really making new
ones for orthographies that are just getting started in the wonderful
world of character encoding.
...
If so, then I remove the restriction, but we must then specify a
specific canonical representation for titles in each language, as you
suggest; perhaps something like a Stringprep profile would be needed.
They've thought of that already too, it seems. :)
See Unicode Standard Annex #15, "Unicode normalization forms":
http://www.unicode.org/unicode/reports/tr15/
-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Title characters