On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz <jkatz(a)wikimedia.org> wrote:
Can someone on this list point me to where the
more-like code sits? Or
better, yet would be someone documenting the rules that govern
prioritization of suggestions.
I would like to document the logic for our communities so that we can have
an open discussion about what variables and weighting we should use to
suggest articles.
"More like" is an Elasticsearch
<https://en.wikipedia.org/wiki/Elasticsearch> feature; the documentation is
here
<https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html>.
I'd imagine the source code is way too complicated to give much insight to
the casual reader (as Elasticsearch is a large and complex piece of
software) but I never looked into the ES codebase so that's just a guess.
The configuration we use for morelike queries is here
<https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450>.
The wrapper code that fires the ES query is here
<https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800>
(but
at a glance it doesn't do anything interesting).