On Sun, 12 Jan 2003 03:32:20 -0500, Pierre Abbat phma=ce9h4FcxEoVIf6P1QZMOBw@public.gmane.org wrote:
If the character is between 128 and 255 inclusive, present it as a single byte. If it's Greek, give the HTML character name. Else turn it into a number.
Actually, if it's between 128 and 159, reject it outright. Characters with bytecodes between those values have no meaning on the web at all. Unfortunately, they have meaning in the default"Windows" character set, so a certain Word processor from a very large software corporation with a poor reputation litters its documents with #146, #147 etc in the guise of "smart quotes", and these fail to render on some good browsers. Perhaps the input processor could clean the text, replacing these characters with unicode equivalents via a lookup table?