On 14 May 2011 06:33, Andrew Dunbar hippytrail@gmail.com wrote:
I'm almost positive Azeri has the same dotless i issue and perhaps some of the other Turkic languages of Central Asia. One solution is to do accent/diacritic normalization too as part of the canonicalization.
It's a good thing to think about these beforehand. But we already do enough mindless killing of diacritics. It doesn't work across all languages. In Finnish saa and sää are different words and ä is not a letter "a" with something added to it.
-Niklas