Re: [Wikitech-l] Fun with URL encoding

11 Oct 2002


      Paul Ebermann wrote:
...
"Brion VIBBER" skribis:
...
Note that _theoretically_ a legal UTF-8 sequence could also be legal ISO 
8859-1.
[eo] Cxu vere?
Mi pensis ke en la komenco de la dua duono de ISO-8859-1
estas kelkaj numeroj reservita (kontrola kodoj) -
128 gxis 159, se mi memoras gxuste. Tiuj estas la bitokoj
de la formo 100xxxxx, kiuj ja povas aperi en UTF-8 (en
la dua aux sekvaj bitokoj de UTF-8-kodita signo).
Jes ja, sed ne cxiuj UTF-8-kodoj trovigxas en la gamo rezervita; se la 
sekva(j) bitoko(j) formas laux 101xxxxx ili trovigxas en la gamo 
160-191, kiu konsistigxas el diversaj punkciiloj kaj simboloj. Ekzemple:
Ã¡ -> á
   0xC3 0xA1 -> 0x00E1
   110(00011) 10(1000001) -> 0000000011100001
Malofta bitokaro en latino-1, certe, sed lauxnorma.
...
[en] Really?
I thought that at the start of the second half of
ISO-8859-1 some numbers are reserved (control codes) -
128 to 159, if I remember correctly. That are the octets
of the form 100xxxxx, which can occur in UTF-8 (in the
second or following octets of a UTF-8 encoded sign).
Sure, but not all UTF-8 codes will find themselves in the reserved 
range; if the tail byte(s) are in the form 101xxxxx they'll be in the 
160-191 range, which is populated by various punctuation marks and 
symbols. For instance:
Ã¡ -> á
   0xC3 0xA1 -> 0x00E1
   110(00011) 10(1000001) -> 0000000011100001
Not a terribly likely sequence of bytes in Latin-1, but it's legal.
-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Fun with URL encoding