Re: [Wikitech-l] More aggressive DEFAULTSORT

15 May 2009


      On Fri, May 15, 2009 at 4:22 AM, Tisza Gergő gtisza@gmail.com wrote:
...
Would it be very expensive to have a separate (namespace, title, sortkey) table,
and join on that for queries that need sorting?
You would have to scan the *entire* table you're joining from (which
may be hundreds of millions of rows).  Not a possibility.
On Fri, May 15, 2009 at 5:47 AM, Tisza Gergő gtisza@gmail.com wrote:
...
Coding the first or second type of collation rule seems relatively simple, and
already a huge gain. (Also, RFC 3454 might be worth checking out as it has
language-independent rules for more than diacritics.)
I agree.
...
You can have a separate raw_sortkey column if that's a large concern.
That would still mean an UPDATE of many millions of rows.  Plus you'd
add another column to a table that's already very large --
categorylinks is ~40,000,000 rows on enwiki, and that's an extra 40m
varchar(255)s clogging up the buffer pool even though they're never
going to be used except for the occasional update.
...
Anyway,
this is the same for any solution that does not rely on MySQL collation: when
the rules change, you need to update the relevant column in the database.
Correct.  In fact, when MySQL's rules change you also have to rebuild
the index, AFAIK.
...
What are the chances that we get decent MySQL collation in the close future
(say, next few years)?
If we don't upgrade, I'd say about 0%.  :)  Even if we do, there are
still the uniqueness problems, and the non-BMP problem.  So not very
good, I'd say, for our purposes.  (That's not to say MySQL collation
isn't decent for other purposes).

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] More aggressive DEFAULTSORT