We installed CirrusSearch recently to replace Lucene, and the results it returns from wiki searches seem wildly irrelevant at times.
For example, our wiki (200,000+ titles) has a number of pages that include the word "share" in the title. But when I search for "share" using CirrusSearch, none of these pages come up in the top 100 hits. The first one is hit #120. The #1 hit has "share" only in the names of two categories.
As a pathological example, I created a wiki page named "Share share share share share" and filled it with the word "share" over and over. When I search the wiki for "share", my page appears as hit number 750! (If I search for "share share", my page comes up first.)
We're using the default CirrusSearch.php configuration except for the following overrides in LocalSettings.php:
$wgCirrusSearchWeights = array( 'title' => 20, // default 20 'redirect' => 15, // default 15 'category' => 8, // default 8 'heading' => 4, // default 5 'opening_text' => 3, // default 3 'text' => 1, // default 1 'auxiliary_text' => 0.5, // default 0.5 'file_text' => 0.5, // default 0.5 );
// Prevent some custom namespaces from showing in search results $wgCirrusSearchNamespaceWeights = array( NS_VP_1 => 0, NS_VP_2 => 0, NS_VP_3 => 0, );
$wgCirrusSearchDefaultNamespaceWeight = 0.5; $wgCirrusSearchPowerSpecialRandom = false;
Any advice on how to debug these weird search results? Thank you very much, DanB
mediawiki-l@lists.wikimedia.org