Hi all, Where are Search stop-words stored? I have the suspicion that the word "gay" may be set as a stop-word by default, because on my MediaWiki setup searching for it gives no results, even though it appears on at least one page, and searching for any other words on that page show up in the search page.
Thanks, Matias
Matias Pelenur wrote:
Hi all, Where are Search stop-words stored? I have the suspicion that the word "gay" may be set as a stop-word by default, because on my MediaWiki setup searching for it gives no results, even though it appears on at least one page, and searching for any other words on that page show up in the search page.
There is a stopword list hard-coded into MySQL. (In 4.0 and up this can be overridden by server-wide configuration.) MediaWiki includes a copy of the default stopword list (FulltextStoplist.php) in order to take them out of multiple-word searches (so if you search for "the united nations", it will search only "united" and "nations", rather than searching "the", returning no results, and thus not matching anything for "united" or "nations" either). I think this is only used in MySQL 3 mode, so if you configured on MySQL 4 it won't use this mode, and it's up to the list actually in MySQL.
Also words appearing in over 50% of the search space will not match; this can affect very small databases particularly.
However your problem is likely the minimum word length limit; I believe the default is four characters, so "gay" would not be found, nor would "tea" or "gun" or "war" or "hat".
For the MySQL 3 mode we again trim out short words before putting them to the search engine; you can override this by setting the variable $wgDBminWordLen in LocalSettings.php. I don't think the check is done in MySQL 4 mode, since it works differently using a more advanced mode in the MySQL engine, but I'm not sure offhand. However you still may need to adjust MySQL itself, see: http://dev.mysql.com/doc/mysql/en/Fulltext_Fine-tuning.html
-- brion vibber (brion @ pobox.com)
Cool, I set the variable in LocalSettings, added a line "ft_min_word_len=3" to my.cnf, repaired the searchindex table, and now it works just fine!
Thanks, matias
Brion Vibber wrote:
Matias Pelenur wrote:
Hi all, Where are Search stop-words stored? I have the suspicion that the word "gay" may be set as a stop-word by default, because on my MediaWiki setup searching for it gives no results, even though it appears on at least one page, and searching for any other words on that page show up in the search page.
There is a stopword list hard-coded into MySQL. (In 4.0 and up this can be overridden by server-wide configuration.) MediaWiki includes a copy of the default stopword list (FulltextStoplist.php) in order to take them out of multiple-word searches (so if you search for "the united nations", it will search only "united" and "nations", rather than searching "the", returning no results, and thus not matching anything for "united" or "nations" either). I think this is only used in MySQL 3 mode, so if you configured on MySQL 4 it won't use this mode, and it's up to the list actually in MySQL.
Also words appearing in over 50% of the search space will not match; this can affect very small databases particularly.
However your problem is likely the minimum word length limit; I believe the default is four characters, so "gay" would not be found, nor would "tea" or "gun" or "war" or "hat".
For the MySQL 3 mode we again trim out short words before putting them to the search engine; you can override this by setting the variable $wgDBminWordLen in LocalSettings.php. I don't think the check is done in MySQL 4 mode, since it works differently using a more advanced mode in the MySQL engine, but I'm not sure offhand. However you still may need to adjust MySQL itself, see: http://dev.mysql.com/doc/mysql/en/Fulltext_Fine-tuning.html
-- brion vibber (brion @ pobox.com)
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org