Re: [Wikitech-l] Binary vs. non-binary strings (was: What is required to "fix search"?)

15 Apr 2006


      Erik Moeller wrote:
...
On 4/13/06, Gregory Maxwell gmaxwell@gmail.com wrote:
...
AFAIR, most string matches in mysql are case insensitive, which would
mean that we could have indexed case insensitive matches quickly...
but I'm guessing that our use of binary fields for titles (which is
required because no version of mysql has complete UTF-8 support) most
likely breaks that.
Yes, they are, and yes, it does. Could someone explain what the exact
reason is that we're using varchar binary in MediaWiki for page
titles? I've been using regular varchars for my WiktionaryZ tables,
and so far it seems to work fine with UTF-8. Where exactly does a
non-binary varchar break?
It could break string matching, but would definitely break sorting. (Sorting by
codepoint may suck, but at least it's predictable.)
More generally, deliberately choosing a non-binary collation which applies to a
*different character set* from the one really you're using seems pretty silly.
You get unpredictable, incorrect sorting and potentially have strings rejected
as invalid.
...
Might it make sense to use non-binary strings and use binary
comparisons instead where case-sensitivity is required?
When MySQL finally reaches the mid-1990s and supports full Unicode, we'll be
able to do that.
-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Binary vs. non-binary strings (was: What is required to "fix search"?)