The point of the parser was to detect the cases where the query was not well-formed, unbalanced brackets, "A and or B", et cetera, and then give some kind of syntax error to indicate what is wrong. What do you do with those cases now?
Now I let MySQL choke on it, and report back its error, which is actually much more useful than it sounds. The part of the program that reports MySQL errors is very good; I originally made it that way for debugging, but it's not bad for this case either. I just don't see that much benefit from making nicer error messages on badly formed searches.
Btw. is it correct that you only highlight one search word per line in the result of the search?
If fixed that, and also the word-boundary problem, but it does still limit the context display to 60 characters before and after the first hit of each line. Personally, I think that's plenty to get a sense of context, but I might be convinced otherwise.
Ah, I see, sorry for not checking your code first. So that is why the scoring doesn't work anymore. The simplest way to get scoring back is probably to not eliminate duplicates when processing.
Hmm. That might be a good idea. I might even be able to add extra duplicates for words in headings or something. I initially assumed that eliminating duplicates would speed the search, but if it hurts the scoring, that's not a good tradeoff.
Not necessarily. The usual way to do this is define your own index table like Text_index(word, article, #occurrences) and then you let MySQL compute some sort of scoring and sort on that. This is tricky if you have OR and NOT but with only AND this is easy.
I think I'll try removing the dup-stripping first. 0
On Wed, Jun 12, 2002 at 02:37:43PM -0700, lcrocker@nupedia.com wrote:
The point of the parser was to detect the cases where the query was not well-formed, unbalanced brackets, "A and or B", et cetera, and then give some kind of syntax error to indicate what is wrong. What do you do with those cases now?
Now I let MySQL choke on it, and report back its error, which is actually much more useful than it sounds. The part of the program that reports MySQL errors is very good; I originally made it that way for debugging, but it's not bad for this case either. I just don't see that much benefit from making nicer error messages on badly formed searches.
Agreed, although telling the user that there has been an "unrecoverable database error" and asking him or her to inform the administrator might be a bit overdone in this case. :-)
Btw. is it correct that you only highlight one search word per line in the result of the search?
If fixed that, and also the word-boundary problem, but it does still limit the context display to 60 characters before and after the first hit of each line. Personally, I think that's plenty to get a sense of context, but I might be convinced otherwise.
FWIW, I think its enough. I would even vote for limiting the number of lines you report to a maximum of 5 or so.
Ah, I see, sorry for not checking your code first. So that is why the scoring doesn't work anymore. The simplest way to get scoring back is probably to not eliminate duplicates when processing.
Hmm. That might be a good idea. I might even be able to add extra duplicates for words in headings or something. I initially assumed that eliminating duplicates would speed the search, but if it hurts the scoring, that's not a good tradeoff.
It just takes up disk space. What determines the speed is the index structure and unless MySQL is doing something really stupid here that is not going to get much bigger.
-- Jan Hidders
wikipedia-l@lists.wikimedia.org