Nik, can you reflect this work in a task?
If it's something we can knock off quickly which will enable Readership to experiment on a self-serve basis, we should do it.
Thanks, Dan
On 2 June 2015 at 15:06, Nikolas Everett neverett@wikimedia.org wrote:
These are the options we use for the more_like_this query: $wgCirrusSearchMoreLikeThisConfig = array( 'min_doc_freq' => 2, // Minimum number of documents (per shard) that need a term for it to be considered 'max_query_terms' => 25, 'min_term_freq' => 2, 'percent_terms_to_match' => 0.3, 'min_word_len' => 0, 'max_word_len' => 0, );
Here is the reference for what they mean and any more we might be able to set: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ml...
We only use the "text" field of the articles - no weighting based on, well, anything. See the text field in https://en.wikipedia.org/wiki/Barack_Obama?action=cirrusdump for example.
Stuff we could do really, really easily:
- Add url parameters that override each of those options for easy
experimenting. 2. Add url parameters to use different fields like our weighted all field, the wikitext, or intro paragraphs (don't ask how we extract into paragraphs
- its a horrible hack), or the section headers, or the "secondary" text
like the inforboxes and image subtitles.
These are seriously very little work. A couple of hours. A day if we're being really good about testing _and_ someone merges something to core that screws up the tests. If it enables lots of cool experimenting I'm all for doing it.
Nik
On Tue, Jun 2, 2015 at 9:54 AM, Dan Garry dgarry@wikimedia.org wrote:
On 1 June 2015 at 23:07, Bernd Sitzmann bernd@wikimedia.org wrote:
The few terms I've tried it on morelike: search prefix produced better Read more articles than our old way
Funny, I kind of found the opposite! So, I suggest running a test.
You could increment the MobileWikiAppArticleSuggestions https://meta.wikimedia.org/wiki/Schema:MobileWikiAppArticleSuggestions schema, removing the "version" field (since it's redundant now anyway) and adding a "suggestionsSource" field. Make a copy of SuggestionsTask https://github.com/wikimedia/apps-android-wikipedia/blob/master/wikipedia/src/main/java/org/wikipedia/page/SuggestionsTask.java which uses the new method to generate results. Bucket users 50/50, half of them getting the old method for suggestions and half of them getting the new method. Transmit which version they got in the "suggestionsSource" field. Run analysis to determine which gets users to engage more, then go with that way! This would make a nice quarterly goal for next quarter, I think. :-)
Thanks, Dan
-- Dan Garry Product Manager, Search and Discovery Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search