Hoi,
I am really happy with your extensive description of why it is such a pain
in the arse. The situation is even worse, there are more wikipedia languages
then there are languages with a proper CLDR description. It would be a dear
thing when we could strongly urge our language communities to verify, append
and amend the CLDR. It would make a *practical* difference in
.
You are right that it is not an absolute road block for other languages to
have their Wikipedia. It is not. It is however amazing that we have a
Wikipedia in languages like Hindi and Malayalam. The problem for those
languages is even more basic. They have problems with Unicode itself.
To appreciate this compare the Indonesian Wikipedia with all the Wikipedias
of the Indian subcontinent. As Bahasa Indonesia is written in the Latin
script, it is that much easier to write articles for that language. As a
result you will find that the Indonesian Wikipedia is bigger in traffic then
all the Indian Wikipedias combined.
In conclusion, we need to spend genuine effort in supporting other scripts.
I appreciate that you are not volunteering. It would however be a project
that would make a big difference to many of our projects.
Thanks,
GerardM
On 10 June 2010 14:40, Domas Mituzas <midom.lists(a)gmail.com> wrote:
Hi!
Yes it is a technical pain in the arse.The
question is one of primacy. Is
it
more important to provide service or are
technical considerations of the
most importance. Yes, we discussed this in the past and we did not agree
then and we do not agree now.
Well, I agree that it might be good idea to have language-specific
ordering, just costs are quite high and there're not too many people eager
to do engineering part of such project.
CLDR isn't panacea, it is constantly evolving project, with inaccurate
stable versions (even for well established languages like mine, heheh), and
various proposed/testing versions.
So, to pick CLDR based flow, and do it properly, it would consist of
infinite loop of:
1. Understanding which languages need a separate collation
2. Evaluating all available collations for a language, attracting input
from local communities and standardization bodies
3. Evaluating the algorithmic implications of chosen collation - then
either approaching standards bodies to change it, or simplifying it
internally (and forking), or implementing algorithms in software (though
that sometimes is impossible to do in efficient way)
4. Porting (3) into a backend of choice
5. Provide upgrade path and conflict resolution method for existing content
6. Provide framework to do full index rebuilds and switchover between
different collations (ok, this probably is one-time engineering project,
albeit quite complex, as it has to have (4) and (5) in mind)
7. Monitor for new versions of collations :)
Multiply all that by number of languages we have, and do note that there're
multiple sorting variants per language too (e.g. dictionary vs phonebook
ordering in Germany).
So yes, it would be fantastic to have that kind of functionality, but you'd
need quite some engineering capacity to pull it off.
And if we get to implementation specifics - ordering rules are same as
equality rules, causing quite some confusion in some cases (and some people
will definitely want to have same sorted but not equal terms.. :)
Of course, we can use community driven sortkey hacks for some features ;-)
I wonder how our English language readers would
react when the sort order
for their lists would be wrong.
I guess it isn't absolutely tragic for others, as otherwise we wouldn't see
projects in other languages at all. Now thats a benchmark! ;-)
Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l