Category sorting in MediaWiki has always been done wrong.
Categories are not sorted alphabetically, but in Unicode order. I
don't know when or why DEFAULTSORT was introduced, but today it is
being used mostly for sorting people by surname, e.g.
{{DEFAULTSORT:Wales, Jimmy}}, and for sorting other topics by
another word than the first, so that the "European Commission" is
sorted under C rather than E in
http://en.wikipedia.org/wiki/Category:Institutions_of_the_European_Union
If you look closer at that category, you see that some items are
sorted under E, which probably means somebody forgot to use
DEFAULTSORT there:
* European Court of Auditors (0)
* European Union Mission (1)
* European quarter of Brussels (1)
What's even more remarkable is that "quarter" is sorted after
"Union". This is because lower case letters sort after all the
upper case letters in ASCII and Unicode. That is how broken
category sorting is in MediaWiki.
Another example of broken sorting is when whitespace is compared
to letters. In ASCII and Unicode, whitespace (position 32) sorts
ahead of all printable characters. This means Moon illusion sorts
ahead of Moonbow in
http://en.wikipedia.org/wiki/Category:Moon
because the whitespace before "illusion" is compared to the b in
Moonbow. I'm not sure if this is correct in English, but in
Swedish it is wrong; bow should sort before illusion, regardless
of the whitespace.
There is a way to avoid all such problems, namely by a more
aggressive use of DEFAULTSORT that removes from sorting all upper
case letters (except the initial one), all whitespace and all
commas. It would mean almost every article needs a DEFAULTSORT.
In the examples above:
{{DEFAULTSORT:Walesjimmy}}
{{DEFAULTSORT:Europeancourtofauditors}}
{{DEFAULTSORT:Europeanunionmission}}
{{DEFAULTSORT:Europeanquarterofbrussels}}
{{DEFAULTSORT:Moonillusion}}
This can be done with bots, for sure, if we agree that it should
be done. Is this something we should strive for? Has any language
of Wikipedia (or Wikinews or...) already started to do this?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se