On 10/26/07, Rolf Lampa <rolf.lampa(a)rilnet.com>
wrote:
Soundex.
The title variants, or, very often due to differencies in spelling, is
an old problem which was solved a long time ago, long before computers
came about. The (old) solution was based on the fact that sound
comprises differencies in spelling etc, hence "Soundex" :
Heh. No. Soundex is awful. There might be something better by now, but
not Soundex. Anything but that.
In a previous job I briefly flirted with it to perform name matching but it
(or the SQL Server implementation at least)
is useless - it collapses any name down to 4 consonants, making Steve
and Stove identical, for instance.
Anyway a Soundex-like tool might be useful to complement or improve
searching, but the situation I'm describing here is when you know exactly
what search terms you want to reach, but it's a lot of effort to create all
those redirects.
There's been a better alternative to Soundex for many years called
Metaphone. I think there's even several variants of it these days.
I did some tests with Soundex or Metaphone when I was developing my
DidYouMean extension. It's not too hard to use a different normalization
algorithm. I also tried angagrams and textonyms.
Andrew Dunbar (hippietrail)
Steve
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l