On Nov 17, 2003, at 20:06, Lars Aronsson wrote:
Erik Moeller wrote:
perhaps in the 5% range. Given that a single edit by such a person will break an entire page, it might not be so wise to switch (but perhaps I'm missing something -- is Meta running UTF-8?).
Would it be possible to let the database run on UTF-8 internally, but to let the PHP script analyze and convert data to and from certain browsers? Perhaps the majority of users are using UTF-8-capable browsers, so the conversion would use a minimum of resources.
Certainly possible, as long as care is taken to keep round-trips clean.
Another possibility is simply to 'blacklist' known problem browsers by printing a notice/link to better browsers on the edit page warning that they may have problems, as we now have a warning on long pages that some browsers may have problems. (Though in that case we aren't checking specific browsers.)
The main problem browser these days is Internet Explorer for Mac; it's years out of date and the most recent version still doesn't grok UTF-8 for editing. The most recent Macs ship with Safari as the default, but most existing Macs out there are going to have IE or (shudder) Netscape 4.x as the default browser.
All I know is that MySQL has better UTF-8 support from version 4.1.x, as described in chapter 9, http://www.mysql.com/doc/en/Charset.html The same goes for Perl version 5.8, but what about PHP?
PHP currently has pretty much no UTF-8 support aside from some conversion functions. Strings are treated as arbitrary-length byte sequences, and we've got some custom functions to deal with case changing and the like.
There are some multibyte character set support functions which may or may not be suitable for replacing the Utf8Case functions, that should get looked into.
-- brion vibber (brion @ pobox.com)