[Wikipedia-l] Box characters and question marks
Pierre Abbat
phma at webjockey.net
Sun Sep 1 16:59:26 UTC 2002
On Sunday 01 September 2002 12:48, lcrocker at nupedia.com wrote:
> Character codes 128..160 don't exist in ISO-8859-1, or in
> Unicode. They're empty, illegal codes that represent nothing.
> How some browser, OS, or font chooses to display them is
> entirely a matter of taste--some display boxes, some display
> question marks, some display nothing at all. It doesn't
> matter, because their isn't any "correct" way to display
> codes that don't represent anything.
>
> The problem is that /some/ character sets, notably Microsoft
> Windows code page 1252, /do/ use those character codes for things
> like curly quotes and em dashes. To be correctly encoded for
> Wikipedia, they should be changed to HTML entities referencing
> either the character name (e.g., "lsquo"), or the correct Unicode
> value. Copying Windows text with those things directly into
> Wikipedia creates the illegal characters. When we see that
> happen, we should try to figure out what they're supposed to be
> and replace them with correct ones.
The problem with doing that is that they all look alike to me, or to anyone
else who isn't running Windows. Generally if I see one box character that's
supposed to be, say, an apostrophe, I replace all the boxes with apostrophes.
If, however, there's a measurement in there in seconds, and I can't tell
whether it's supposed to be minutes, I'll put the wrong character in it.
I left Quercus a note about finding boxes with a hex editor. I use khexedit
on Linux, but he's obviously not using Linux. Can you recommend a hex editor
for him?
phma
More information about the Wikipedia-l
mailing list