Hi everyone,
Hi Platonides,
Ok.
Finding the "Hex UTF-8 bytes" representation of an "Hex code point"
is not intuitive.
In the link "http://www.cl.cam.ac.uk/~mgk25/unicode.html",
faq "What is UTF-8?", I found some parts of answer to my question.
Let's consider the "Hex code point" 0xC3.
What is the sequence of bits used to represent that character
as "Hex UTF-8 bytes"?
The binary representation of 0xC3 is 1100 0011.
The first bit of this byte being 1 (and not 0)
we will use the following "pattern" with two bytes to represent that
code:
110xxxxx 10xxxxxx
and replace the "x" with the proper bits.
To do it, we read the binary representation of 0xC3
from right to left:
- 8th bit of 0xC3 binary representation: 1
Replace the 16th x in 110xxxxx 10xxxxxx with 1:
110xxxxx 10xxxxx1
- 7th bit of 0xC3 binary representation: 1
Replace the 15th x in 110xxxxx 10xxxxx1 with 1:
110xxxxx 10xxxx11
- 6th bit of 0xC3 binary representation: 0
Replace the 14th x in 110xxxxx 10xxxx11 with 0:
110xxxxx 10xxx011
- 0
110xxxxx 10xx0011
- 0
110xxxxx 10x00011
- 0
110xxxxx 10000011
- 1
110xxxx1 10000011
- 1
110xxx11 10000011
And replace the remaining "x" with zeros:
11000011 10000011
The hexadecimal representation of 11000011 is 0xC3.
The hexadecimal representation of 10000011 is 0x83.
Hence the "Hex UTF-8 bytes" representation of 0xC3 is 0xC3 0x83.
Is that it?
Thanks and all the best,
--
Lmhelp
--
View this message in context:
http://old.nabble.com/Web-page-source---%22strange%22-characters-tp27999218…
Sent from the WikiMedia General mailing list archive at
Nabble.com.