On Fri, Nov 2, 2018 at 3:51 AM Hogan (US), Michael C < Michael.C.Hogan2@boeing.com> wrote:
Can anyone point me to a starting point for learning about how to tune CirrusSearch (or examples)? I found the CirrusSearchScoreBuilder page [1], which implies it is possible to modify how search results are ranked. But, the documentation page hasn't been created yet. Thank you!
Hi,
there are many ways to tune the ranking of search results. The hook you mention is designed to be used by extensions that want to tune everything related to the search query itself. I strongly discourage to use it, it is highly experimental and will be removed in the future.
To understand how cirrus scores docs I suggest to start with this documentation [2]. You can then tune the retrieval query using profiles and the wgCirrusSearchFullTextQueryBuilderProfiles config array: E.g. $wgCirrusSearchFullTextQueryBuilderProfiles => [ 'my_custom_profile' => [ 'builder_class' => \CirrusSearch\Query\FullTextSimpleMatchQueryBuilder::class, 'settings' => [ 'default_min_should_match' => '1', 'default_query_type' => 'most_fields', 'default_stem_weight' => 3.0, 'fields' => [ 'title' => 0.3, 'redirect.title' => [ 'boost' => 0.27, 'in_dismax' => 'redirects_or_shingles' ], 'suggest' => [ 'is_plain' => true, 'boost' => 0.20, 'in_dismax' => 'redirects_or_shingles', ], 'category' => 0.05, 'heading' => 0.05, 'text' => [ 'boost' => 0.6, 'in_dismax' => 'text_and_opening_text', ], 'opening_text' => [ 'boost' => 0.5, 'in_dismax' => 'text_and_opening_text', ], 'auxiliary_text' => 0.05, 'file_text' => 0.5, ], 'phrase_rescore_fields' => [ 'all' => 0.06, 'all.plain' => 0.1, ], ], ], ];
And then activate it by default: $wgCirrusSearchFullTextQueryBuilderProfile = "perfield_builder";
Please see [3] for more doc on the various settings.
To tune the query independent signals (the rescoring part in the doc), this is similar as you declare a profile and activate it by default. The config var to add a new profile is $wgCirrusSearchRescoreProfiles and you can add more by following these examples [4]. The config var to change the default rescore profile is $wgCirrusSearchRescoreProfile. Rescore profiles internally use "rescore function chains" which can be tuned as well using $wgCirrusSearchRescoreFunctionChains [5].
I'm sorry if this is bit dense and for the lack of comprehensive documentation. I suggest having a look at the elasticsearch documentation as well as many concepts here are related to elasticsearch features (dismax, rescoring, function score, ...). We have also some integration with the LTR plugin [6].
Please let me know if you have specific questions or specific problems I could help going into a specific direction instead of digesting all of this.
Thank you.
[2] https://www.mediawiki.org/wiki/Extension:CirrusSearch/Scoring [3] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSe... [4] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSe... [5] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSe... [6] https://github.com/o19s/elasticsearch-learning-to-rank