I thought the MVP of the relevance lab could only test zero-results. Isn't there a fair bit more effort required for it to also be able to test some measure of "relevance?

Hopefully I'm mistaken.



Kevin Smith
Agile Coach, Wikimedia Foundation


On Mon, Nov 9, 2015 at 8:53 AM, Erik Bernhardson <ebernhardson@wikimedia.org> wrote:
Seems reasonable to me. I'm not sure what to do with 1 & 2 yet, so I've started pulling queries out of hive for 3 (the accept-language stuff).

Erik B.

On Sun, Nov 8, 2015 at 9:51 PM, Dan Garry <dgarry@wikimedia.org> wrote:
Summarising this discussion, it seems like the path forward which would reap the most rewards is as follows:
  1. Finish the MVP of the relevance lab; right now we can only test zero results rate for any given experiment, and the lab will help us also test result relevance.
  2. Start writing tests to switch out the language detector used in the first test with alternative ones, to see if they're better
    • This should affect the zero results rate, so lack of the relevance lab does not block this
    • This should also affect relevance (at least conceptually), so can be tested using the relevance lab also
  3. Write test to use accept-language header as a heuristic to do language switching (rather than language detection)
    • This should affect the zero results rate, so lack of the relevance lab does not block this
    • This should also affect relevance (at least conceptually), so can be tested using the relevance lab also
  4. Expand original language switching test to also switch if there are "few" results (let's say "few" = 3 or fewer).
    • Does not really affect zero results rate; this is dependent on relevance lab
Any objections to this course of action? I plan to file tasks for these mid-Monday morning.

Thanks,
Dan

On 2 November 2015 at 16:58, Erik Bernhardson <ebernhardson@wikimedia.org> wrote:
Now that we have the feature deployed (behind a feature flag), and have an initial "does it do anything?" test going out today, along with an upcoming integration with our satisfaction metrics, we need to come up with how will will try to further move the needle forward.

For reference these are our Q2 goals:
  • Run A/B test for a feature that:
    • Uses a library to detect the language of a user's search query.
    • Adjusts results to match that language.
  • Determine from A/B test results whether this feature is fit to push to production, with the aim to:
    • Improve search user satisfaction by 10% (from 15% to 16.5%).
    • Reduce zero results rate for non-automata search queries by 10%.
We brainstormed a number of possibilities here:
https://etherpad.wikimedia.org/p/LanguageSupportBrainstorming

We now need to decide which of these ideas we should prioritize. We might want to take into consideration which of these can be pre-tested with our relevancy lab work, such that we can prefer to work on things we think will move the needle the most. I'm really not sure which of these to push forward on, so let us know which you think can have the most impact, or where the expected impact could be measured with relevancy lab with minimal work.



_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery




--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery



_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery