Hoi, The point of this thread is that the collation is not working well. A hack is proposed and in my opinion it makes better sense to apply what is effectively the standard for collation. The way the CLDR is managed is very much with the same care as is usual for standards. Several of the people who I know are involved with the CLDR are also involved in standards and RFC's. Your assertion that data can not be part of a standard ... what is this based on ? I know that ISO is considering the implementation of something they call "data as a standard".
You mention that you prefer binary sorts. Is this an approachwhere one size should fit all? Because if it is, you have an approach that is broken by design. Collation can be different from language to language. As a matter of fact the original Dutch collation is no longer used because of the tirrany of people who did not want to take the conventions of "other" languages in consideration.
When it comes to collation, I would not mind if Oracle had people have a day job to implement proper collation for the many languages that exist. When the WMF needs collation, it would use this functionality that I would expect to be available as a standard in MySQL.
You have to appreciate that Wikipedia is currently localised in over 300 languages and you just cannot shrug the complexeties that come with this away. If it takes a few people having a full time job just to support our languages properly, it would be completely justified. Thanks, GerardM
2009/5/13 Domas Mituzas midom.lists@gmail.com
Hi!
I have been always told that we develop and implent open source in order to create open content using open standards. In my opinion you have not provided any argument why any other approach is preferable. In this case the CLDR is an applicable open standard.
I wonder why you call it 'a standard', markup is standard, data is not. This is what Wikipedia says:
"The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in the XML format for use in computer applications. CLDR contains locale specific information that an operating system will typically provide to applications. "
I'd prefer using binary sort, then we don't have to change anything, and everything is done extremely efficient :-) Anyway, there're lots and lots of implementation details/problems.
It is easy to point at collection of data, it is not that easy to merge it into production environment, handle data conflicts, staging, etc. Do you want to get few people fulltime working just on this?
Shrug, Domas
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l