[Wikipedia-l] Box characters and question marks

Pierre Abbat phma at webjockey.net
Sun Sep 1 16:59:26 UTC 2002


On Sunday 01 September 2002 12:48, lcrocker at nupedia.com wrote:
> Character codes 128..160 don't exist in ISO-8859-1, or in
> Unicode.  They're empty, illegal codes that represent nothing.
> How some browser, OS, or font chooses to display them is
> entirely a matter of taste--some display boxes, some display
> question marks, some display nothing at all.  It doesn't
> matter, because their isn't any "correct" way to display
> codes that don't represent anything.
>
> The problem is that /some/ character sets, notably Microsoft
> Windows code page 1252, /do/ use those character codes for things
> like curly quotes and em dashes.  To be correctly encoded for
> Wikipedia, they should be changed to HTML entities referencing
> either the character name (e.g., "lsquo"), or the correct Unicode
> value.  Copying Windows text with those things directly into
> Wikipedia creates the illegal characters.  When we see that
> happen, we should try to figure out what they're supposed to be
> and replace them with correct ones.

The problem with doing that is that they all look alike to me, or to anyone 
else who isn't running Windows. Generally if I see one box character that's 
supposed to be, say, an apostrophe, I replace all the boxes with apostrophes. 
If, however, there's a measurement in there in seconds, and I can't tell 
whether it's supposed to be minutes, I'll put the wrong character in it.

I left Quercus a note about finding boxes with a hex editor. I use khexedit 
on Linux, but he's obviously not using Linux. Can you recommend a hex editor 
for him?

phma



More information about the Wikipedia-l mailing list