Cross-posting from wikitech-l. Please reply there.
---------- Forwarded message ---------- From: Dan Garry dgarry@wikimedia.org Date: 1 September 2015 at 20:43 Subject: Discovery Department A/B testing an alternative to prefix search next week To: Wikimedia developers wikitech-l@lists.wikimedia.org
Hi everyone,
*tl;dr: Discovery Department to run A/B test https://phabricator.wikimedia.org/T111078 comparing new search suggester to prefix search, to see if it can reduce zero results rate.*
As I'm sure you're all aware, the search box at the top right of every page on desktop uses prefix search to generate its results. The main reason for this is that prefix search is incredibly fast and performant; that search box sees a lot of traffic, and it's important to keep it scalable.
However, we know that there are numerous problems with prefix search. Prefix searches are prone to give you no results; if you make even a slight typo, then you won't get the result you want. And thus a complex system of manually curated redirects were born to try to alleviate this navigation issue. Wouldn't it be nice if we could work towards a solution that doesn't require the manual curation of redirects, thus freeing up Wikimedians to do other more meaningful tasks? And make search a bit better in the process, too? That's a long term goal of mine... emphasis on the long. ;-)
The Q1 2015-17 (Jul - Aug 2015) goal of the Search Team in the Discovery Department is to reduce the zero results rate https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals#Search. Amongst other things, we've been working to build an alternative to prefix search https://phabricator.wikimedia.org/T105746. Documentation on the API is pretty light right now because we're scrambling to get it up and running (but there's a task for that! https://phabricator.wikimedia.org/T111139).
An initial version of the suggestion API is now in production on enwiki and dewiki [1], but is currently not being used for anything. Our initial tests https://phabricator.wikimedia.org/T109729 of the API show that it's incredibly promising for reducing the zero results rate. But we need more data!
We're planning on running an A/B test on whether this API is better at reducing zero results. We're targeting beginning on Tuesday 8th September, for two weeks. This is documented in T111078 https://phabricator.wikimedia.org/T111078.
A very important note here is that we currently have no way of quantitatively measuring result relevance (although we're working on it https://phabricator.wikimedia.org/T109482), so this test will be highly limited in scope, only measuring the zero results rate. Given the limits of this, even seeing massive success in this test is not enough to deploy this API as a full replacement of prefix search; we'd need additional data. But, that's not stopping us from gathering initial data from this test.
As always, if you have any questions, let me know.
Thanks, Dan
[1]: The API is actually live on all wikis, but we only built the search indices for enwiki and dewiki since they're our biggest content wikis and this is an early test. Attempting to use the API on any other wiki will get you a cirrus backend error.
wikimedia-search@lists.wikimedia.org