I'm just starting to implement search in the new codebase--the last
major piece of the puzzle not there.
I just found another annoyance and I don't know if there's a
workaround or not. MySQL treats the ' character as part of a
word. It's odd that it's otherwise so restrictive but allows
that one--only letters, digits, underscore, and '. The upshot
of this in wikitext is that if the only appearance of a word
in an article is in bold or italics, i.e., ''like this'', MySQL
will index "'''word'''", but not "word", and so a search for "word"
will fail. So, ironically, it throws away references which we
have specifically emphasized.
Is there a way to change this behavior of MySQL, or is this one
more reason to give up and pre-preocess the whole text of each
article like we do for cur_ind_title?
0