On Mon, Nov 9, 2015 at 12:59 PM, Kevin Smith <ksmith@wikimedia.org> wrote:

I thought the MVP of the relevance lab could only test zero-results. Isn't there a fair bit more effort required for it to also be able to test some measure of "relevance?

Hopefully I'm mistaken.

Just including zero results rate was the baseline plan for the MVP, but I tried to make it maximally general, so it's a special case of an abstract metrics class, and I included 4 metrics, none of which are very complex: counting queries (so an empty string doesn't count as a "query"), zero results, top-5 diff ordered (i.e., the top 5 moved around or where replaced), and top-5 unordered (i.e., any of the top 5 were kicked out of the top 5, but shuffling doesn't matter).

There's a top-N ordered or unordered class, so changing or adding a different number is trivial. the main work for new metrics is writing a function that determines whether the metric applies, given the two JSON blobs; there's also some easy stuff like stating whether it's symmetrical, and setting some output parameters for examples.

So, metrics that don't require human thought (like, total results, or changes in order) are easy to add. Metrics that require human thought and external annotations (like, is this particular desired pageId included) are harder.

I'm thinking about language detection and preferred result annotations and how to include them and diff them.