Hoi, The introduction demonstrates that Unicode indeed deals with collation.
When you look at the characters in Unicode, you will find that the Unicode UTF-8 standard is very much a work in progress. When you look at the CLDR you will find that it is also very much a work in progress HOWEVER, for many languages the collation has been well defined and is unlikely to change. When you look at the CLDR for African languages, there is a project called Afrigen where they are collecting the relevant information necessary to include it into the CLDR.
I am not impressed by your argument that you will have to rebuild the sorting order when there are indeed changes to a collation order. First of all standards like the CLDR know releases so these iterations only happen when a new release becomes available and second of all it seems weird to me to refuse to implement an improved collation order when it is wrong in the first place.
I have been always told that we develop and implent open source in order to create open content using open standards. In my opinion you have not provided any argument why any other approach is preferable. In this case the CLDR is an applicable open standard.
When as a consequence of an improved collation order for particular languages we have to rebuild databases every now and again, then it is tough but it needs to be done. It is all part of normal and acceptable system management. Thanks, GerardM
http://o2.it46.se/afrigen/statistics.php
2009/5/13 Domas Mituzas midom.lists@gmail.com
Hi!
This is not CLDR, this is general collation algorithm.
http://cldr.unicode.org/index/cldr-spec/collation-guidelines
CLDR is a repository/process for LDMLs (thats what I referred to people sending us that data, in case current is wrong/not existing). Currently it has mistakes and multiple versions even for same locales
- doesn't seem to be too stable nor correct.
An example:
http://unicode.org/cldr/data/common/collation/lt.xml?rev=1.26&content-ty... ;-)
Do note, that such unstable changes require database rebuilds at each iteration.. So, we'd have to have someone reviewing it all, comparing with different sources, and then pushing it once every few years into some data staging environment where we do data conversions all the time? :) riiight...
Domas
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l