We created a ticket for this on iOS as soon as Joaquin gave his presentation at the hackathon.
Since we have the search schema setup, we can swap out the service and check the stats for the new version. (It would be nice if we had an additional field to specify the “search service” in the schema, but we could just query the specific version for which the search service was changed to get the data we need)
While anecdotal comparisons of which service produces the "better” results are useful for evaluating the algorithm - the real test is in the analytics. If read more click through goes up, its better, if it goes down, it’s worse.
As far as tweaking the algorithm, I’d like to keep that all server side and let the clients be dumb. Maybe the API can return a value that would represent the algorithm so we could save that to the analytics for comparison?
On Tue, Jun 2, 2015 at 11:03 AM, Dan Garry dgarry@wikimedia.org wrote:
Nik, can you reflect this work in a task?
If it's something we can knock off quickly which will enable Readership to experiment on a self-serve basis, we should do it.
Thanks, Dan
On 2 June 2015 at 15:06, Nikolas Everett neverett@wikimedia.org wrote:
These are the options we use for the more_like_this query: $wgCirrusSearchMoreLikeThisConfig = array( 'min_doc_freq' => 2, // Minimum number of documents (per shard) that need a term for it to be considered 'max_query_terms' => 25, 'min_term_freq' => 2, 'percent_terms_to_match' => 0.3, 'min_word_len' => 0, 'max_word_len' => 0, );
Here is the reference for what they mean and any more we might be able to set: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ml...
We only use the "text" field of the articles - no weighting based on, well, anything. See the text field in https://en.wikipedia.org/wiki/Barack_Obama?action=cirrusdump for example.
Stuff we could do really, really easily:
- Add url parameters that override each of those options for easy
experimenting. 2. Add url parameters to use different fields like our weighted all field, the wikitext, or intro paragraphs (don't ask how we extract into paragraphs - its a horrible hack), or the section headers, or the "secondary" text like the inforboxes and image subtitles.
These are seriously very little work. A couple of hours. A day if we're being really good about testing _and_ someone merges something to core that screws up the tests. If it enables lots of cool experimenting I'm all for doing it.
Nik
On Tue, Jun 2, 2015 at 9:54 AM, Dan Garry dgarry@wikimedia.org wrote:
On 1 June 2015 at 23:07, Bernd Sitzmann bernd@wikimedia.org wrote:
The few terms I've tried it on morelike: search prefix produced better Read more articles than our old way
Funny, I kind of found the opposite! So, I suggest running a test.
You could increment the MobileWikiAppArticleSuggestions https://meta.wikimedia.org/wiki/Schema:MobileWikiAppArticleSuggestions schema, removing the "version" field (since it's redundant now anyway) and adding a "suggestionsSource" field. Make a copy of SuggestionsTask https://github.com/wikimedia/apps-android-wikipedia/blob/master/wikipedia/src/main/java/org/wikipedia/page/SuggestionsTask.java which uses the new method to generate results. Bucket users 50/50, half of them getting the old method for suggestions and half of them getting the new method. Transmit which version they got in the "suggestionsSource" field. Run analysis to determine which gets users to engage more, then go with that way! This would make a nice quarterly goal for next quarter, I think. :-)
Thanks, Dan
-- Dan Garry Product Manager, Search and Discovery Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Dan Garry Product Manager, Search and Discovery Wikimedia Foundation
reading-wmf mailing list reading-wmf@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/reading-wmf