Hi, I patched UtfNormal.php to use the new intl extension's normalization function, http://php.net/manual/en/normalizer.normalize.php
But running UtfNormalTest.php causes 200+ errors, whilst the PHP normalization routines are error free.
So kind at a standstill wondering what the problem is. I guess WM use utf8_normalize() which is based on ICU like intl, does than have the same errors?
Jared
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jared Williams wrote:
I patched UtfNormal.php to use the new intl extension's normalization function, http://php.net/manual/en/normalizer.normalize.php
But running UtfNormalTest.php causes 200+ errors, whilst the PHP normalization routines are error free.
So kind at a standstill wondering what the problem is. I guess WM use utf8_normalize() which is based on ICU like intl, does than have the same errors?
It may be that intl is using an ICU library which supports an older version of Unicode. UtfNormal is currently built using the Unicode 5.1 files, as I recall, so will have the 5.1 test cases.
For the most part, this means that the newer version includes mapping rules for newly-added characters -- normalization is deliberately designed to be compatible and existing rules should never change. (Though there might be a few exceptions changed as errata, I forget.)
- -- brion
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Brion Vibber Sent: 24 November 2008 19:17 To: Wikimedia developers Subject: Re: [Wikitech-l] UTF8 Normalization
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jared Williams wrote:
I patched UtfNormal.php to use the new intl extension's
normalization
function, http://php.net/manual/en/normalizer.normalize.php
But running UtfNormalTest.php causes 200+ errors,
whilst the PHP
normalization routines are error free.
So kind at a standstill wondering what the problem is.
I guess WM use
utf8_normalize() which is based on ICU like intl, does than
have the
same errors?
It may be that intl is using an ICU library which supports an older version of Unicode. UtfNormal is currently built using the Unicode 5.1 files, as I recall, so will have the 5.1 test cases.
For the most part, this means that the newer version includes mapping rules for newly-added characters -- normalization is deliberately designed to be compatible and existing rules should never change. (Though there might be a few exceptions changed as errata, I forget.)
Ah, php.net provided intl is built with ICU 3.6..which is unicode 5.0
Jared
wikitech-l@lists.wikimedia.org