[MediaWiki-l] Help debugging CirrusSearch problems?

David Causse dcausse at wikimedia.org
Tue Oct 13 09:12:05 UTC 2015


Hi,

Sorry if this message does show up in the original thread (I subscribed 
to this ml after the original message was posted)

It's always extremely complex to debug relevancy problems and it depends 
on many factors.
It looks like the word "share" is quite common in your corpus so it is 
possible that your problem is due to the "all" field.
The all field is a performance hack that allows cirrus to query a single 
field (see https://phabricator.wikimedia.org/T107666) but it has some 
drawbacks with common words.

Could you try to add &cirrusUseAllFields=no to the search results URL 
and see if it affects the ranking?
The URL should be : 
https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&search=share&fulltext=Search&cirrusUseAllFields=no

Note that if you use the all field :
$wgCirrusSearchAllFields = array( 'build' => true, 'use' => true );
and if you change weights in wgCirrusSearchWeights you'll have to 
re-index your data.

Another test you could try is to disable rescore, rescore is a feature 
that will reorder the top-N results (8192 by default). You can limit the 
rescore impact by adding the following URL parameters 
&cirrusFunctionWindow=1&cirrusPhraseWindow=1 .

If nothing helps could you share the result of
- 
https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&search=share&fulltext=Search&cirrusDumpQuery
and
- 
https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&search=share&fulltext=Search&cirrusDumpResult

Thanks,

David

> From: *Daniel Barrett* <danb at cimpress.com <mailto:danb at cimpress.com>>
> Date: Mon, Oct 12, 2015 at 9:57 AM
> Subject: [MediaWiki-l] Help debugging CirrusSearch problems?
> To: MediaWiki announcements and site admin list 
> <mediawiki-l at lists.wikimedia.org <mailto:mediawiki-l at lists.wikimedia.org>>
>
>
> We installed CirrusSearch recently to replace Lucene, and the results 
> it returns from wiki searches seem wildly irrelevant at times.
>
> For example, our wiki (200,000+ titles) has a number of pages that 
> include the word "share" in the title. But when I search for "share" 
> using CirrusSearch, none of these pages come up in the top 100 hits.  
> The first one is hit #120. The #1 hit has "share" only in the names of 
> two categories.
>
> As a pathological example, I created a wiki page named "Share share 
> share share share" and filled it with the word "share" over and over. 
> When I search the wiki for "share", my page appears as hit number 
> 750!  (If I search for "share share", my page comes up first.)
>
> We're using the default CirrusSearch.php configuration except for the 
> following overrides in LocalSettings.php:
>
> $wgCirrusSearchWeights = array(
>    'title' => 20,   // default 20
>    'redirect' => 15,   // default 15
>    'category' => 8,   // default 8
>    'heading' => 4,   // default 5
>    'opening_text' => 3,   // default 3
>    'text' => 1,   // default 1
>    'auxiliary_text' => 0.5,   // default 0.5
>    'file_text' => 0.5,   // default 0.5
> );
>
> // Prevent some custom namespaces from showing in search results
> $wgCirrusSearchNamespaceWeights = array(
>         NS_VP_1 => 0,
>         NS_VP_2 => 0,
>         NS_VP_3  => 0,
> );
>
> $wgCirrusSearchDefaultNamespaceWeight = 0.5;
> $wgCirrusSearchPowerSpecialRandom = false;
>
> Any advice on how to debug these weird search results?
> Thank you very much,
> DanB
>
>
> _______________________________________________
> MediaWiki-l mailing list
> To unsubscribe, go to:
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>



More information about the MediaWiki-l mailing list