Hoi,
The point of this thread is that the collation is not working well. A hack
is proposed and in my opinion it makes better sense to apply what is
effectively the standard for collation. The way the CLDR is managed is very
much with the same care as is usual for standards. Several of the people who
I know are involved with the CLDR are also involved in standards and RFC's.
Your assertion that data can not be part of a standard ... what is this
based on ? I know that ISO is considering the implementation of something
they call "data as a standard".
You mention that you prefer binary sorts. Is this an approachwhere one size
should fit all? Because if it is, you have an approach that is broken by
design. Collation can be different from language to language. As a matter of
fact the original Dutch collation is no longer used because of the tirrany
of people who did not want to take the conventions of "other" languages in
consideration.
When it comes to collation, I would not mind if Oracle had people have a day
job to implement proper collation for the many languages that exist. When
the WMF needs collation, it would use this functionality that I would expect
to be available as a standard in MySQL.
You have to appreciate that Wikipedia is currently localised in over 300
languages and you just cannot shrug the complexeties that come with this
away. If it takes a few people having a full time job just to support our
languages properly, it would be completely justified.
Thanks,
GerardM
2009/5/13 Domas Mituzas <midom.lists(a)gmail.com>
Hi!
I have been always told that we develop and
implent open source in
order to
create open content using open standards. In my opinion you have not
provided any argument why any other approach is preferable. In this
case the
CLDR is an applicable open standard.
I wonder why you call it 'a standard', markup is standard, data is
not. This is what Wikipedia says:
"The Common Locale Data Repository Project,
often abbreviated as
CLDR, is a project of the Unicode Consortium to provide locale data
in the XML format for use in computer applications. CLDR contains
locale specific information that an operating system will typically
provide to applications. "
I'd prefer using binary sort, then we don't have to change anything,
and everything is done extremely efficient :-)
Anyway, there're lots and lots of implementation details/problems.
It is easy to point at collection of data, it is not that easy to
merge it into production environment, handle data conflicts, staging,
etc.
Do you want to get few people fulltime working just on this?
Shrug,
Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l