Hi,
Sorry if this message does show up in the original thread (I subscribed to this ml after the original message was posted)
It's always extremely complex to debug relevancy problems and it depends on many factors. It looks like the word "share" is quite common in your corpus so it is possible that your problem is due to the "all" field. The all field is a performance hack that allows cirrus to query a single field (see https://phabricator.wikimedia.org/T107666) but it has some drawbacks with common words.
Could you try to add &cirrusUseAllFields=no to the search results URL and see if it affects the ranking? The URL should be : https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&...
Note that if you use the all field : $wgCirrusSearchAllFields = array( 'build' => true, 'use' => true ); and if you change weights in wgCirrusSearchWeights you'll have to re-index your data.
Another test you could try is to disable rescore, rescore is a feature that will reorder the top-N results (8192 by default). You can limit the rescore impact by adding the following URL parameters &cirrusFunctionWindow=1&cirrusPhraseWindow=1 .
If nothing helps could you share the result of - https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&... and - https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&...
Thanks,
David
From: *Daniel Barrett* <danb@cimpress.com mailto:danb@cimpress.com> Date: Mon, Oct 12, 2015 at 9:57 AM Subject: [MediaWiki-l] Help debugging CirrusSearch problems? To: MediaWiki announcements and site admin list <mediawiki-l@lists.wikimedia.org mailto:mediawiki-l@lists.wikimedia.org>
We installed CirrusSearch recently to replace Lucene, and the results it returns from wiki searches seem wildly irrelevant at times.
For example, our wiki (200,000+ titles) has a number of pages that include the word "share" in the title. But when I search for "share" using CirrusSearch, none of these pages come up in the top 100 hits. The first one is hit #120. The #1 hit has "share" only in the names of two categories.
As a pathological example, I created a wiki page named "Share share share share share" and filled it with the word "share" over and over. When I search the wiki for "share", my page appears as hit number 750! (If I search for "share share", my page comes up first.)
We're using the default CirrusSearch.php configuration except for the following overrides in LocalSettings.php:
$wgCirrusSearchWeights = array( 'title' => 20, // default 20 'redirect' => 15, // default 15 'category' => 8, // default 8 'heading' => 4, // default 5 'opening_text' => 3, // default 3 'text' => 1, // default 1 'auxiliary_text' => 0.5, // default 0.5 'file_text' => 0.5, // default 0.5 );
// Prevent some custom namespaces from showing in search results $wgCirrusSearchNamespaceWeights = array( NS_VP_1 => 0, NS_VP_2 => 0, NS_VP_3 => 0, );
$wgCirrusSearchDefaultNamespaceWeight = 0.5; $wgCirrusSearchPowerSpecialRandom = false;
Any advice on how to debug these weird search results? Thank you very much, DanB
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
David Causse writes:
Could you try to add &cirrusUseAllFields=no to the search results URL and see if it affects the ranking? The URL should be : https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&...
Yes, that helped a lot!! Thank you!!
Another test you could try is to disable rescore... (&cirrusFunctionWindow=1&cirrusPhraseWindow=1)
I tried that alone (without changing cirrusUseAllFields), and it produced very relevant results but appeared to downrank the main namespace. So the top hits were all in the File namespace, the Template namespace, etc.
Then I tried it together with cirrusUseAllFields=no, and the results were not as good as cirrusUseAllFields alone.
Thank you so much. I wouldn't have found this on my own. DanB
Negaliu suprasti Jusu laisko 2015 spa. 14 17:19 "Daniel Barrett" danb@cimpress.com rašė:
David Causse writes:
Could you try to add &cirrusUseAllFields=no to the search results URL and
see if it affects the ranking?
The URL should be :
https://mediawiki/w/index.php?title=Special%3ASearch&profile=default&...
Yes, that helped a lot!! Thank you!!
Another test you could try is to disable rescore...
(&cirrusFunctionWindow=1&cirrusPhraseWindow=1)
I tried that alone (without changing cirrusUseAllFields), and it produced very relevant results but appeared to downrank the main namespace. So the top hits were all in the File namespace, the Template namespace, etc.
Then I tried it together with cirrusUseAllFields=no, and the results were not as good as cirrusUseAllFields alone.
Thank you so much. I wouldn't have found this on my own. DanB _______________________________________________ MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org