[Wikipedia-l] Switching everything to UTF-8

Peter Gervai grin at tolna.net
Wed Nov 19 08:45:46 UTC 2003


On Tue, Nov 18, 2003 at 02:49:12PM -0800, Delirium wrote:
> Peter Gervai wrote:
> >On Tue, Nov 18, 2003 at 04:28:32AM -0500, Daniel Mayer wrote:
> >>Peter Gervai wrote:

> I don't really see the problem with typing embed codes manually on, for 
> example, the English Wikipedia. 

Try to type greek, chinese phrases this way. Go on, try. See any chinese
related articles, see the name there (don't cheat, don't edit the article),
and type its embed codes in your favourite editor. I'd like to know how many
percentage do you guess chinese characters right by just looking at them.
:-)

May I put bets on you? ;->

> Actually, I think with the current setup, on en: at least, you can type 
> them literally and when you hit submit or preview it'll automatically 
> convert them to the numeric codes.  I seem to recall this happening with 
> some Greek text I pasted in (though I could be mistaken).

Yes, you are pretty wrong here. Some browsers actually change those codes to
embeds (like Mozilla), which is *illegal*. There is not guarantee at the
time of the POST that the server wants _HTML_ (or anyway else) encoded text,
so the browser do a _wild_guess_, and pick the most used encoding. 

Other browsers' authors say that they conform standards and are not willing
to make wild, baseless guesses, and they do not submit illegal characters to
a POST where the encoding type does not support it. 

Anyone may debate that but I see no basis to force anyone not to follow the
standards. 

(And not every browsers do it this way. Some does it the other way,
submitting anything they like, and php and mysql and other underlying crap
changes it to god knows what, and the article changes the next time someone
edits it, and nobody going to be able to tell why did it happen. It's fun.
:))

grin



More information about the Wikipedia-l mailing list