steve vertigo wrote:
correction -- the utf-8 seems to be only one that
works... I thought the bottom text on the frontpage
was unicode...
Terminology note:
"Unicode" is a _character set_, which maps abstract numerical code
points to characters. Unicode code points (and hence characters) may be
represented in a number of ways.
"UTF-8" is a _character encoding_, which maps Unicode code points to
variable-length sequences of bytes. UTF-8's primary feature is that it
is compatible with ASCII, which has made it popular in Unix and internet
contexts as a more or less backwards-compatible way of storing Unicode text.
"UTF-16" is another character encoding, which maps Unicode code points
to 16-bit integers. (Or, sometimes, to two 16-bit integers.) For
historical reasons and/or stupidity ;) UTF-16 (or its evil elder sister
UCS-2) may get called "Unicode" by some software. If you select
so-called "Unicode" encoding for a page that's encoded in UTF-8, you'll
probably corrupt the display.
There are also many domain-specific ways of encoding Unicode characters;
in HTML and XML (and SGML, if the document character set is defined as
Unicode) you can use sequences such as 〹 (decimal) or ሴ
(hexadecimal). Because these only use ASCII characters to do their dirty
work, they're robust through other character encoding conversions and
can be typed in any text editor (if you know the numbers). However they
are specific to that type of markup language, take up more space than
binary encodings, and don't necessarily survive forms well if let
through unencoded.
-- brion vibber (brion @
pobox.com)