Re: [Wikipedia-l] Searching in new codebase

12 Jun 2002


      ...
The point of the parser was to detect the cases where the query
was not well-formed, unbalanced brackets, "A and or B", et cetera,
and then give some kind of syntax error to indicate what is wrong.
What do you do with those cases now?
Now I let MySQL choke on it, and report back its error, which is
actually much more useful than it sounds.  The part of the program
that reports MySQL errors is very good; I originally made it that
way for debugging, but it's not bad for this case either.  I just
don't see that much benefit from making nicer error messages on
badly formed searches.
...
Btw. is it correct that you only highlight one search word per line
in the result of the search?
If fixed that, and also the word-boundary problem, but it does
still limit the context display to 60 characters before and after
the first hit of each line.  Personally, I think that's plenty to
get a sense of context, but I might be convinced otherwise.
...
Ah, I see, sorry for not checking your code first. So that is why
the scoring doesn't work anymore. The simplest way to get scoring
back is probably to not eliminate duplicates when processing.
Hmm. That might be a good idea.  I might even be able to add extra
duplicates for words in headings or something.  I initially assumed
that eliminating duplicates would speed the search, but if it hurts
the scoring, that's not a good tradeoff.
...
Not necessarily. The usual way to do this is define your own index
table like
  Text_index(word, article, #occurrences)
and then you let MySQL compute some sort of scoring and sort on
that. This is tricky if you have OR and NOT but with only AND
this is easy.
I think I'll try removing the dup-stripping first.
0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] Searching in new codebase