-----Original Message-----
From: wikitech-l-bounces(a)lists.wikimedia.org
[mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of
Brion Vibber
Sent: 24 November 2008 19:17
To: Wikimedia developers
Subject: Re: [Wikitech-l] UTF8 Normalization
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jared Williams wrote:
I patched UtfNormal.php to use the new intl
extension's
normalization
function,
http://php.net/manual/en/normalizer.normalize.php
But running UtfNormalTest.php causes 200+ errors,
whilst the PHP
normalization routines are error free.
So kind at a standstill wondering what the problem is.
I guess WM use
utf8_normalize() which is based on ICU like intl,
does than
have the
same errors?
It may be that intl is using an ICU library which supports an
older version of Unicode. UtfNormal is currently built using
the Unicode 5.1 files, as I recall, so will have the 5.1 test cases.
For the most part, this means that the newer version includes
mapping rules for newly-added characters -- normalization is
deliberately designed to be compatible and existing rules
should never change.
(Though there might be a few exceptions changed as errata, I forget.)