On 14 May 2011 06:33, Andrew Dunbar <hippytrail(a)gmail.com> wrote:
I'm almost positive Azeri has the same dotless i
issue and perhaps
some of the other Turkic languages of Central Asia. One solution is to
do accent/diacritic normalization too as part of the canonicalization.
It's a good thing to think about these beforehand. But we already do
enough mindless killing of diacritics. It doesn't work across all
languages. In Finnish saa and sää are different words and ä is not a
letter "a" with something added to it.
-Niklas
--
Niklas Laxström