So, you want to ignore the problem of different collation orders for
different languages?
...or perhaps we can use the proper locale data, which will tell us
how to sort for each language.
This would also give us information on dates, which are for example
not quite right in some of the files, especially for underdevelopped
Wikipedias, and information on numbers and times ("PM"). It would also
provide us with the appropriate native name of the language to use in
interwiki links (iirc).
Mark
On Tue, 22 Feb 2005 01:39:32 +0100, Tomasz Wegrzanowski
<taw(a)users.sf.net> wrote:
On Mon, Feb 21, 2005 at 09:22:17AM -0800, Ray
Saintonge wrote:
Petr Kadlec wrote:
Note that there is a bugreport about that in
Bugzilla:
http://bugzilla.wikimedia.org/show_bug.cgi?id=164
You can at least throw in a vote. :-)
I can sympathize with the idea, but one has to keep in mind that the
sort order should vary between one language and another. In English we
would alphebetize "æ" as though it were "ae" while Danish treats it
as a
separate letter tagged on at the end of the alphabet.
We can do a lot better than Unicode binary sort order.
The sort order should not vary between Latin-script languages, because it
should be script-dependent, not language-depedent (sorted words don't have
to come from the same language, but merely have to be in the same script - think
proper names). It's very unfortunate that there are different traditions
for sorting Latin writing system.
In most languages the right place for base letter X with diacritical mark Y
is somewhere after plain letter X (and we can chose order of Ys that generates
few conflicts).
In some it sorts the same way as X, in which case sorting it after X
is still much better than at the end of alphabet.
The languages with such letters at the end, or with ever weirder orders are few,
and we won't break any more than we currently do if we adapt "base, then
base+diacritics"
sorting.
_______________________________________________
Wikipedia-l mailing list
Wikipedia-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikipedia-l