2009/5/11 Lars Aronsson <lars(a)aronsson.se>se>:
Category sorting in MediaWiki has always been done
wrong.
Categories are not sorted alphabetically, but in Unicode order.
Sure thing. See
https://bugzilla.wikimedia.org/show_bug.cgi?id=164
with 95 votes…
Another example of broken sorting is when whitespace
is compared
to letters. In ASCII and Unicode, whitespace (position 32) sorts
ahead of all printable characters. This means Moon illusion sorts
ahead of Moonbow in
http://en.wikipedia.org/wiki/Category:Moon
because the whitespace before "illusion" is compared to the b in
Moonbow. I'm not sure if this is correct in English, but in
Swedish it is wrong; bow should sort before illusion, regardless
of the whitespace.
In Czech, this is correct, Czech collation works on individual words.
As you see, the rules are language-specific.
There is a way to avoid all such problems, namely by a
more
aggressive use of DEFAULTSORT that removes from sorting all upper
case letters (except the initial one), all whitespace and all
commas.
The problem is much more difficult than that (see the linked bug).
Commas, case sensitivity and whitespace are a trivial problem in
comparison with non-ASCII letters.
-- [[cs:User:Mormegil | Petr Kadlec]]