On Sunday 12 January 2003 10:07, Tomasz Wegrzanowski wrote:
On Sun, Jan 12, 2003 at 06:48:08AM -0800, Toby Bartels wrote:
As for the forbidden numerical character entites from € to š, we can interpret them as if they came from Micro$oft (most likely) and convert them to whatever they should be (by table). (If any other forbidden numerical entities have common nonstandard uses, then we can adopt those as well as long as they translate to good Unicode.)
They translate to Unicode 128-154. Unicode 0-255 is identical with ISO-8859-1.
The problem is that some contributors, apparently copy-pasting from some word processor in Windows or from 1911, enter those characters as if they mean something. Then I see an article with boxes in it, try to guess what the boxes are supposed to be, and often get it wrong. 80-9f in ISO-8859-1 map to 0080-009f in Unicode, but they are invalid characters in both and display as boxes, slugs, spaces, or nothing. (9f is 159, btw.)
phma