On Tue, May 12, 2009 at 4:38 PM, Brion Vibber <brion(a)wikimedia.org> wrote:
* Collation use for sorting needs to be double-checked
to confirm it
wouldn't interfere with present uniqueness constraints
Since cl_sortkey isn't part of any unique key, this appears not to be
an issue for this use. Of course, it's an issue for every other
sorted list of titles, but those can't have custom sort keys specified
to begin with and don't seem to be included in this proposal. Perhaps
they should be, though. In that case we'd probably end up needing an
extra column in every single table that includes the page title, just
for sorting (but we'd be able to use flexible algorithms to generate
the sort key, rather than being stuck with MySQL's).
* Multilingual sites possibly not well served by
table-wide
language-specific coding
utf8 sorting would be a lot better than binary sorting for any site,
I'm pretty sure. (I assume utf8 sorts sanely and not according to
codepoint.)
Doing our own localized sort key encoding and adding
another indexed
column to sort on would avoid some dependency issues but has its own
deployment and maintenance difficulties.
You don't need another column for categorylinks, you can use the
existing cl_sortkey, so that should be relatively easy to deploy. It
doesn't help with non-category use cases, of course.
It would also be possible to use a separate column for
the collated
sorting while using MySQL 4.1+'s native collations, if the uniqueness
constraints are a problem, but this is still dependent on rolling out an
upgrade from 4.0.
In that case we may as well make it like cl_sortkey and populate it
ourselves, surely.