When using cmstartsortkey for a category, I get the impression the result is incorrect.
I requested:
http://en.wikipedia.org/w/api.php?cmprop=title%7Cids%7Csortkey%7Ctimestamp&a...
The way I understand it (and the way pywikipediabot uses it) this should give the disambiguation pages starting at Da. In reality however, it gives them starting at D + <some unicode character beyond z>. Am I misunderstanding the working of this api part or is this a bug?
2011/4/7 Andre Engels andreengels@gmail.com:
The way I understand it (and the way pywikipediabot uses it) this should give the disambiguation pages starting at Da.
It never listed disambig pages whose *title* starts at Da, it always listed disambig pages whose *sortkey* starts at Da. With the category collation rewrite, sortkeys are now in all uppercase (which is why you need DA to accomplish what you want), and they may have crazier binary formats in the future.
Roan Kattouw (Catrope)
On Thu, Apr 7, 2011 at 10:26 AM, Roan Kattouw roan.kattouw@gmail.com wrote:
2011/4/7 Andre Engels andreengels@gmail.com:
The way I understand it (and the way pywikipediabot uses it) this should give the disambiguation pages starting at Da.
It never listed disambig pages whose *title* starts at Da, it always listed disambig pages whose *sortkey* starts at Da. With the category collation rewrite, sortkeys are now in all uppercase (which is why you need DA to accomplish what you want), and they may have crazier binary formats in the future.
Thanks, that answers my question. Just go to uppercase :-)
On Thu, Apr 7, 2011 at 4:26 AM, Roan Kattouw roan.kattouw@gmail.com wrote:
With the category collation rewrite, sortkeys are now in all uppercase (which is why you need DA to accomplish what you want), and they may have crazier binary formats in the future.
I want to confirm this: are you saying that all the following sortkeys are now treated as identical?
"Article title" "Article Title" "ARTICLE TITLE" "ArTiClE TiTlE"
I'm asking because for a while we had to explicitly set the sortkey to "Article Title" to avoid sorting based on the capitalization of the second word.
Thanks for the info,
- Carl
2011/4/7 Carl (CBM) cbm.wikipedia@gmail.com:
I want to confirm this: are you saying that all the following sortkeys are now treated as identical?
"Article title" "Article Title" "ARTICLE TITLE" "ArTiClE TiTlE"
Yes, they're all normalized to "ARTICLE TITLE". This (uppercasing of everything) is a temporary solution until we can enable UCA, which will probably mangle things even more.
Roan Kattouw (Catrope)
mediawiki-api@lists.wikimedia.org