On Mon, Jul 31, 2006 at 04:46:58PM +0200, Warhog (aja Julian Fleischer) wrote:
Unicode support in PHP is also lacking, you can't
safe a PHP-Script as
UTF-8 or even UTF-16. But a programming language more or less does not
even need to support, as unicode is designed to be compatible to older
charsets. You can also handle strings as binary data, no problem then.
For PHP there is also a multibyte extension and there are some functions
available which allow you to converts ISo-8859-X and UTF-8 (for XML
support e.g.). I don't know how far mediawiki makes use of such
features, as it uses UTF-8 only i guess - which is compatible and
therefore there is no need for PHP to be "compliant" or something like that.
PHP is supposedly planning to incorporate Python's ICU, which has some
reasonable Unicode support for regexen, at some point in the future.
Ruby is reportedly integrating Oniguruma (a regular expression engine)
by the end of the year, which will apparently provide substantial
Unicode support -- though Oniguruma can be used now as an external
library, of course, and someone started supporting ICU support for Ruby
a while ago too (as an external library -- though of course it's an
external library in Python too). Perl, of course, probably has several
dozen ways to support Unicode in CPAN.
. . . but as far as I'm aware, there's no such thing as a language that
provides full native Unicode support. The best we could do is use an
external library, which is something you can do with Ruby anyway.
--
CCD CopyWrite Chad Perrin [
http://ccd.apotheon.org ]
unix virus: If you're using a unixlike OS, please forward
this to 20 others and erase your system partition.