Re: [Wikitech-l] More aggressive DEFAULTSORT

15 May 2009

On Fri, May 15, 2009 at 4:22 AM, Tisza Gergő &lt;gtisza(a)gmail.com&gt; wrote:
...
  Would it be very expensive to have a separate
(namespace, title, sortkey) table,
 and join on that for queries that need sorting? 
You would have to scan the *entire* table you're joining from (which
may be hundreds of millions of rows).  Not a possibility.

On Fri, May 15, 2009 at 5:47 AM, Tisza Gergő &lt;gtisza(a)gmail.com&gt; wrote:
...
  Coding the first or second type of collation rule
seems relatively simple, and
 already a huge gain. (Also, RFC 3454 might be worth checking out as it has
 language-independent rules for more than diacritics.) 
I agree.

...
  You can have a separate raw_sortkey column if
that's a large concern. 
That would still mean an UPDATE of many millions of rows.  Plus you'd
add another column to a table that's already very large --
categorylinks is ~40,000,000 rows on enwiki, and that's an extra 40m
varchar(255)s clogging up the buffer pool even though they're never
going to be used except for the occasional update.

...
  Anyway,
 this is the same for any solution that does not rely on MySQL collation: when
 the rules change, you need to update the relevant column in the database. 
Correct.  In fact, when MySQL's rules change you also have to rebuild
the index, AFAIK.

...
  What are the chances that we get decent MySQL
collation in the close future
 (say, next few years)? 
If we don't upgrade, I'd say about 0%.  :)  Even if we do, there are
still the uniqueness problems, and the non-BMP problem.  So not very
good, I'd say, for our purposes.  (That's not to say MySQL collation
isn't decent for other purposes).

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] More aggressive DEFAULTSORT