Hi Chris,
On 20 November 2018 03:39:02 GMT+05:30, Chris Koerner <ckoerner(a)wikimedia.org>
wrote:
== Did you know? ==
Thanks for the informative did you know section. It was an interesting read. :-)
* Letters are encoded internally by computers as
numbers—for example,
“A” is 65 and “a” is 97.[3] Years ago, programs and even websites
would use different encodings[4] to represent text, often leading to
unreadable gibberish on screen. Unicode[5] was intended to be a single
encoding for most of the world’s writing systems. The most-used parts
of it fit into a 16-bit representation,[6] which can handle about 65
thousand characters. But that's not enough for the very large number
of rare and historical Chinese, Japanese, and Korean (CJK) characters,
which are represented in 16-bit Unicode using “surrogate pairs”.[7]
1,024 Unicode characters are set aside to be “high surrogates”—the
first half of a 32-bit character—and 1,024 characters are set aside to
be “low surrogates”—the second half. By themselves, the surrogates
aren’t valid and don’t represent anything, but in pairs they can
represent over a million additional characters. Since these characters
are usually rare, software can sometimes treat them incorrectly split
them up, which can result in you seeing the Unicode replacement
character �,[8] which is used when something has gone wrong processing
Unicode text. (When the character is fine, but you don’t have a font
to show it, you sometimes get little squares instead. Since the most
common source of these squares for English speakers is unrepresented
CJK characters, a slang term for the squares is “tofu”.[9])
[0]
https://phabricator.wikimedia.org/T168427
[1]
https://phabricator.wikimedia.org/T209293
[2]
https://phabricator.wikimedia.org/T209156
[3]
https://en.wikipedia.org/wiki/ASCII#Printable_characters
[4]
https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings
[5]
https://en.wikipedia.org/wiki/Unicode
[6]
https://en.wikipedia.org/wiki/UTF-16
[7]
https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates
[8]
https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
[9]
https://en.wiktionary.org/wiki/tofu#Noun
--
Sivaraam
Sent from my Android device with K-9 Mail. Please excuse my brevity.