On Mon, Jul 31, 2006 at 04:46:58PM +0200, Warhog (aja Julian Fleischer) wrote:
Unicode support in PHP is also lacking, you can't safe a PHP-Script as UTF-8 or even UTF-16. But a programming language more or less does not even need to support, as unicode is designed to be compatible to older charsets. You can also handle strings as binary data, no problem then.
For PHP there is also a multibyte extension and there are some functions available which allow you to converts ISo-8859-X and UTF-8 (for XML support e.g.). I don't know how far mediawiki makes use of such features, as it uses UTF-8 only i guess - which is compatible and therefore there is no need for PHP to be "compliant" or something like that.
PHP is supposedly planning to incorporate Python's ICU, which has some reasonable Unicode support for regexen, at some point in the future. Ruby is reportedly integrating Oniguruma (a regular expression engine) by the end of the year, which will apparently provide substantial Unicode support -- though Oniguruma can be used now as an external library, of course, and someone started supporting ICU support for Ruby a while ago too (as an external library -- though of course it's an external library in Python too). Perl, of course, probably has several dozen ways to support Unicode in CPAN.
. . . but as far as I'm aware, there's no such thing as a language that provides full native Unicode support. The best we could do is use an external library, which is something you can do with Ruby anyway.