2009/5/11 Lars Aronsson lars@aronsson.se:
Category sorting in MediaWiki has always been done wrong. Categories are not sorted alphabetically, but in Unicode order.
Sure thing. See https://bugzilla.wikimedia.org/show_bug.cgi?id=164 with 95 votes…
Another example of broken sorting is when whitespace is compared to letters. In ASCII and Unicode, whitespace (position 32) sorts ahead of all printable characters. This means Moon illusion sorts ahead of Moonbow in http://en.wikipedia.org/wiki/Category:Moon because the whitespace before "illusion" is compared to the b in Moonbow. I'm not sure if this is correct in English, but in Swedish it is wrong; bow should sort before illusion, regardless of the whitespace.
In Czech, this is correct, Czech collation works on individual words. As you see, the rules are language-specific.
There is a way to avoid all such problems, namely by a more aggressive use of DEFAULTSORT that removes from sorting all upper case letters (except the initial one), all whitespace and all commas.
The problem is much more difficult than that (see the linked bug). Commas, case sensitivity and whitespace are a trivial problem in comparison with non-ASCII letters.
-- [[cs:User:Mormegil | Petr Kadlec]]