These are the options we use for the more_like_this query:
$wgCirrusSearchMoreLikeThisConfig = array(
    'min_doc_freq' => 2,              // Minimum number of documents (per shard) that need a term for it to be considered
    'max_query_terms' => 25,
    'min_term_freq' => 2,
    'percent_terms_to_match' => 0.3,
    'min_word_len' => 0,
    'max_word_len' => 0,
);

Here is the reference for what they mean and any more we might be able to set: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

We only use the "text" field of the articles - no weighting based on, well, anything. See the text field in https://en.wikipedia.org/wiki/Barack_Obama?action=cirrusdump for example.

Stuff we could do really, really easily:

1. Add url parameters that override each of those options for easy experimenting.

2. Add url parameters to use different fields like our weighted all field, the wikitext, or intro paragraphs (don't ask how we extract into paragraphs - its a horrible hack), or the section headers, or the "secondary" text like the inforboxes and image subtitles.

These are seriously very little work. A couple of hours. A day if we're being really good about testing _and_ someone merges something to core that screws up the tests. If it enables lots of cool experimenting I'm all for doing it.

Nik

On Tue, Jun 2, 2015 at 9:54 AM, Dan Garry <dgarry@wikimedia.org> wrote:

On 1 June 2015 at 23:07, Bernd Sitzmann <bernd@wikimedia.org> wrote:
The few terms I've tried it on morelike: search prefix produced better Read more articles than our old way

Funny, I kind of found the opposite! So, I suggest running a test.

You could increment the MobileWikiAppArticleSuggestions schema, removing the "version" field (since it's redundant now anyway) and adding a "suggestionsSource" field. Make a copy of SuggestionsTask which uses the new method to generate results. Bucket users 50/50, half of them getting the old method for suggestions and half of them getting the new method. Transmit which version they got in the "suggestionsSource" field. Run analysis to determine which gets users to engage more, then go with that way! This would make a nice quarterly goal for next quarter, I think. :-)

Thanks,
Dan

--
Dan Garry
Product Manager, Search and Discovery
Wikimedia Foundation

_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search