On Tue, May 12, 2009 at 4:38 PM, Brion Vibber brion@wikimedia.org wrote:
- Collation use for sorting needs to be double-checked to confirm it
wouldn't interfere with present uniqueness constraints
Since cl_sortkey isn't part of any unique key, this appears not to be an issue for this use. Of course, it's an issue for every other sorted list of titles, but those can't have custom sort keys specified to begin with and don't seem to be included in this proposal. Perhaps they should be, though. In that case we'd probably end up needing an extra column in every single table that includes the page title, just for sorting (but we'd be able to use flexible algorithms to generate the sort key, rather than being stuck with MySQL's).
- Multilingual sites possibly not well served by table-wide
language-specific coding
utf8 sorting would be a lot better than binary sorting for any site, I'm pretty sure. (I assume utf8 sorts sanely and not according to codepoint.)
Doing our own localized sort key encoding and adding another indexed column to sort on would avoid some dependency issues but has its own deployment and maintenance difficulties.
You don't need another column for categorylinks, you can use the existing cl_sortkey, so that should be relatively easy to deploy. It doesn't help with non-category use cases, of course.
It would also be possible to use a separate column for the collated sorting while using MySQL 4.1+'s native collations, if the uniqueness constraints are a problem, but this is still dependent on rolling out an upgrade from 4.0.
In that case we may as well make it like cl_sortkey and populate it ourselves, surely.