I'm just starting to implement search in the new codebase--the last major piece of the puzzle not there.
I just found another annoyance and I don't know if there's a workaround or not. MySQL treats the ' character as part of a word. It's odd that it's otherwise so restrictive but allows that one--only letters, digits, underscore, and '. The upshot of this in wikitext is that if the only appearance of a word in an article is in bold or italics, i.e., ''like this'', MySQL will index "'''word'''", but not "word", and so a search for "word" will fail. So, ironically, it throws away references which we have specifically emphasized.
Is there a way to change this behavior of MySQL, or is this one more reason to give up and pre-preocess the whole text of each article like we do for cur_ind_title?
0