Re: [Wikitech-l] Arabikipedia

10 Jul 2003

Well, thanks for clearing *that up. -- %) 

-S-

...
  "Unicode" is a _character set_, which maps
abstract
 numerical code 
 points to characters. Unicode code points (and hence
 characters) may be 
 represented in a number of ways.

 "UTF-8" is a _character encoding_, which maps
 Unicode code points to 
 variable-length sequences of bytes. UTF-8's primary
 feature is that it 
 is compatible with ASCII, which has made it popular
 in Unix and internet 
 contexts as a more or less backwards-compatible way
 of storing Unicode text.

 "UTF-16" is another character encoding, which maps
 Unicode code points 
 to 16-bit integers. (Or, sometimes, to two 16-bit
 integers.) For 
 historical reasons and/or stupidity ;) UTF-16 (or
 its evil elder sister 
 UCS-2) may get called "Unicode" by some software. If
 you select 
 so-called "Unicode" encoding for a page that's
 encoded in UTF-8, you'll 
 probably corrupt the display.

 There are also many domain-specific ways of encoding
 Unicode characters; 
 in HTML and XML (and SGML, if the document character
 set is defined as 
 Unicode) you can use sequences such as &#12345;
 (decimal) or &#4660; 
 (hexadecimal). Because these only use ASCII
 characters to do their dirty 
 work, they're robust through other character
 encoding conversions and 
 can be typed in any text editor (if you know the
 numbers). However they 
 are specific to that type of markup language, take
 up more space than 
 binary encodings, and don't necessarily survive
 forms well if let 
 through unencoded.

 -- brion vibber (brion @ pobox.com)

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)wikipedia.org

http://mail.wikipedia.org/mailman/listinfo/wikitech-l

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Arabikipedia