Re: [Wikitech-l] update: new MySQL search index committed to CVS

14 Feb 2002

One cute trick that I have often used is to calculate the
"uselessness" of a particular word.  A word is semantically more
useless if it appears more often.  This has really dramatic empirical
results for the better, especially on small datasets.  (Maybe on
really big ones, too, but I've never played with those.)

Thus if someone searches for 'John Malkovich' they get a good result,
because 'John' is not weighted so heavily -- it's a more useless word
because it appears more often in the search set.  But 'Malkovich', now
you're talking, there's a word that _means something_.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] update: new MySQL search index committed to CVS