As of r86257 [1], which will be deployed to Wikimedia wikis soon and
will be included in the 1.17 release, sortkeys output by
list=categorymembers and prop=categories are now encoded as
hexadecimal strings, so "FOO" becomes "464f4f".
As previously announced, sortkeys are no longer guaranteed to be
human-readable, and may in fact contain binary data (this will happen
when Wikimedia switches to the UCA/ICU collation). However, outputting
binary data, notably in XML, was problematic [2], so I decided to use
hexadecimal encoding. This means the sortkey as returned by the API is
now guaranteed to not be human-readable, even if the underlying
collation uses a human-readable format (such as the uppercase
collation currently in use on Wikimedia wikis). However, it will still
sort correctly: if A sorts before B in the binary format, that will
also be the case in the hexadecimal format.
The following things changed:
* The 'sortkey' property in list=categorymembers and prop=categories
is now a hexadecimal string
* In prop=categories , clprop=sortkey will now also output the
'sortkeyprefix' property (human-readable part of the sortkey).
list=categorymembers already provided this through
cmprop=sortkeyprefix
* The format of cmcontinue has changed from type|pageid|rawsortkey to
type|hexsortkey|pageid . If you did not make any assumptions about the
format of cmcontinue and just passed back whatever you got in
query-continue, this won't affect you
Roan Kattouw (Catrope)
[1] https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWik…
[2] https://bugzilla.wikimedia.org/show_bug.cgi?id=28541