On Thu, Aug 22, 2002 at 10:11:41AM -0700, lcrocker@nupedia.com wrote:
The English Wikipedia, and the German one being tested now, are both ISO-8859-1, not UTF-8. UTF-8 will be needed for Polish and other languages. There won't be much software change involved; just telling MySQL to index the right way.
That may in fact involve defining our own new character set for MySQL that defines the properties of the subset of UTF-8 that covers English, German and Polish. Or is each Wikipedia going to get its own mysql server? Anyway, I'll start asking around if something like that not already exists somewhere.
As for a special notation for accented characters, I'm not fond of the idea. Foreign users should have foreign keyboards. Others should still be able to enter accents by whatever means their OS and browser allow, and I'm not aware of any that don't have some feature for it.
All I know at the moment is that the request has been made by a member of the German community. I don't know how many people asked for it, why they wanted it or how badly they need it, but I'll ask them. I'm a bit surprised that Magnus hasn't brought this up, (I'm not German) but I have the impression he has been busy lately.
I don't like duplicating effort that should be already done elsewhere.
The question is not if you would implement it, but only if it would be Ok to define some hooks so that they can implement it themselves if they wanted to without changing any common code.
-- Jan Hidders