[Mediawiki-l] Sphinx is doing case-sensitive searches and I want case-INsensitive

David Benfell benfell at parts-unknown.org
Wed Nov 17 09:21:59 UTC 2010


Hi all,

Experimenting with the sphinx search extension, I've discovered that
searches are now case sensitive.  From what I can see, it isn't
supposed to be doing that.

The possibly relevant lines from sphinx.conf:

# uncomment next 2 lines to allow wildcard (*) searches
        min_infix_len = 1
        enable_star = 1

        # charset encoding type
        charset_type    = utf-8

        # charset definition and case folding rules "table"
        charset_table   = 0..9, A..Z->a..z, a..z, \
                U+C0->a, U+C1->a, U+C2->a, U+C3->a, U+C4->a, U+C5->a, U+C6->a, \
                U+C7->c,U+E7->c, U+C8->e, U+C9->e, U+CA->e, U+CB->e, U+CC->i, \
                U+CD->i, U+CE->i, U+CF->i, U+D0->d, U+D1->n, U+D2->o, U+D3->o, \
                U+D4->o, U+D5->o, U+D6->o, U+D8->o, U+D9->u, U+DA->u, U+DB->u, \
                U+DC->u, U+DD->y, U+DE->t, U+DF->s, \
                U+E0->a, U+E1->a, U+E2->a, U+E3->a, U+E4->a, U+E5->a, U+E6->a, \
                U+E7->c,U+E7->c, U+E8->e, U+E9->e, U+EA->e, U+EB->e, U+EC->i, \
                U+ED->i, U+EE->i, U+EF->i, U+F0->d, U+F1->n, U+F2->o, U+F3->o, \
                U+F4->o, U+F5->o, U+F6->o, U+F8->o, U+F9->u, U+FA->u, U+FB->u, \
                U+FC->u, U+FD->y, U+FE->t, U+FF->s,

Am I reading this right?  It looks to me like it means to translate
all upper case letters to lower case letters; assuming it does this
consistently for both search queries and indexing, that *should* do
the job.  But apparently it isn't.

The installation went smoothly.  I did not encounter any errors.

I have not changed the charset_table.  The wiki is in English, but I
have gotten errors that seem to indicate character set issues that I
haven't figured out how to fix.  At this moment, I can't remember
what I ran that produced those errors.  (Sorry!)  And I would like
to preserve the option to introduce other languages in the future.

-- 
David Benfell <benfell at parts-unknown.org>
http://www.parts-unknown.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
Url : http://lists.wikimedia.org/pipermail/mediawiki-l/attachments/20101117/d4590293/attachment.pgp 


More information about the MediaWiki-l mailing list