Taking the above into consideration and reviewing what we have in the brainstorming session, the set of idea seems to be the following:

Do language detection on more than just zero result queries, how about queries that only return 1 or 2 results

Seems useful and doable, but will only effect satisfaction and not the zero result rate. Still possibly worthwhile.

This should be relatively easy to test with relevancy lab

Determine the language to search in via something other than language detection (headers, geolocation, etc)

Working up a couple heuristics wouldn't be too hard. The webrequests table in hive has the accept language header and geolocation info as well as the query string, so we could extract a set of queries to test with

Integrate wikidata search

This looks to be https://en.wikipedia.org/wiki/MediaWiki:Wdsearch.js

We could integrate that more directly, can't be tested by relevancy lab. It is basically just an additional set of results below the existing results.

Would need a significant cleanup to pass code review, but it's not particularly hard to do

Translate the query from the provided language into the language of the wiki being searched on

This seems "very hard". Not only do we have to correctly detect the language the user input, but then we have to translate that into a second language

The CX service might be able to provide us a translation endpoint that works with whatever they are currently using, but will likely have high latency. Our inability (currently) to do async requests in php makes it harder to hide that latency.

Build an index that contains the titles from all wikis, but not much else. This could be used to suggest the user search on other wikis (or to inform the code that does actual searches on other wikis)

This could be somewhat tested in relevancy lab, but first we would have to build something to actually combine all the titles into the same index.

I think any of the top three could be worked on, the first and the second can be validated through relevancy lab. The third takes a completely different approach and is not easily testable outside of production, but may be useful. The fourth is "very hard" and i think we should leave it alone for now. The fifth and final idea was only put forth once, but is interesting. I'm not sure how valuable it would be though.

On Tue, Nov 3, 2015 at 3:55 PM, Erik Bernhardson <ebernhardson@wikimedia.org> wrote:

In terms of user language data we have, within the webrequests table in hive we have the accept language header and we have geolocation information. This table also contains the query strings so we can extract the exact search terms and feed that information into relevancy lab.

On Tue, Nov 3, 2015 at 3:29 PM, Kevin Smith <ksmith@wikimedia.org> wrote:

So do we think we should favor the "try to guess the user's language(s)" item over others that would benefit from the relevance lab? Are there steps we could/should take in advance, such as analyzing whatever user language data we have, or instrumenting to get more if we don't have enough?

Kevin Smith
Agile Coach, Wikimedia Foundation

On Tue, Nov 3, 2015 at 2:25 PM, Trey Jones <tjones@wikimedia.org> wrote:

Sorry I didn't respond to this sooner!

I really like the idea of trying to detect what languages the user can read, and searching in (a subset of) those. This wouldn't benefit from relevance lab testing, though. It'll need to be measured against the user satisfaction metric. (BTW, Do we have a sense of how many users have info we can detect for this?)

I think the biggest problem with language detection is the quality of the language detector. The Elastic Search plugin we tested has a Romanian fetish when run on our queries (Erik got about 38% Romanian on 100K enwiki searches, which is crazy, and I got 0% accuracy for Romanian on my much smaller tagged corpus of failed (zero results) queries to enwiki). Most of the time, I would expect queries sent to the wrong wiki to fail (though there are some exceptions)—but a query in English that does get hits in rowiki is going to just look wrong most of the time.

There are several proposals for improving language detection in the etherpad, and we can work on them in parallel, since any given one could be better than any other one. (We don't want to make 100 of them, but a few to test and compare would be nice—there may also be reasonable speed/accuracy tradeoffs to be made, e.g., 2% decrease in accuracy for 2x speed is a good deal.)

We need training and evaluation data. I see a few ways of getting it. The easy, lower-quality way is just take queries from a given wiki and assume they are in the language in question (i.e., eswiki queries are in Spanish). Easy, not 100% accurate, unlimited supply. The hard, higher-quality way is to hand annotate a corpus of queries. This is slow, but doable. I can do on the order of 1000 queries in a day—more if I were less accurate and more willing to toss stuff into the junk pile. I couldn't do it for a week straight, though, without going crazy. A possible middle of the road approach would be to create a feedback loop and run detectors on our training data and review and remove items that are not in the desired language (we could also start by filtering things that are not in the right character set, like removing all Arabic, Cyrillic, and Chinese from enwiki, frwiki, and eswiki queries). If we want thousands of hand-annotated queries, we need to get annotating!

I think we can use the relevance lab to help evaluate a language detector (at least with respect to zero results rate). We could run the detector against a pile of zero-results queries, then group the queries by detected language, and run them against the relevant wiki (if we have room in labs for the indexes, and we update the relevance lab tools to support choosing a target wiki to search). We wouldn't be comparing "before" and "after", but just measuring the zero results rate against the target wiki. As any time we're using zero-results rate, there's no guarantee that we'll be giving good results, just results (e.g., "unix time stamp" queries with English words fail on enwiki but sometimes work on zhwiki for some reason, but that's not really better.)

I'm somewhat worried about being able to reduce the targeted zero results rate by 10%. In my test[1], only 12% of non-DOI zero-results queries were "in a language", and only about a third got results when searched in the "correct" (human-determined) wiki. I didn't filter bots other than the DOI bot, and some non-language queries (e.g., names) might get results in another wiki, but there may not be enough wiggle room. There's a lot of junk in other languages, too, but maybe filtering bots will help more than I dare presume.

[1] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_Searching#Perfect_identification.2C_ignoring_non-language_queries

Trey Jones

Software Engineer, Discovery
Wikimedia Foundation

On Mon, Nov 2, 2015 at 9:03 PM, Erik Bernhardson <ebernhardson@wikimedia.org> wrote:

It measures the zero results rate for 1 in 10 search requests via CirrusSearchUserTesting log that we used last quarter.

On Mon, Nov 2, 2015 at 6:01 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:

Define this "does it do anything?" test?

On 2 November 2015 at 19:58, Erik Bernhardson

<ebernhardson@wikimedia.org> wrote:
> Now that we have the feature deployed (behind a feature flag), and have an
> initial "does it do anything?" test going out today, along with an upcoming
> integration with our satisfaction metrics, we need to come up with how will
> will try to further move the needle forward.
>
> For reference these are our Q2 goals:
>
> Run A/B test for a feature that:
>
> Uses a library to detect the language of a user's search query.
> Adjusts results to match that language.
>
> Determine from A/B test results whether this feature is fit to push to
> production, with the aim to:
>
> Improve search user satisfaction by 10% (from 15% to 16.5%).
> Reduce zero results rate for non-automata search queries by 10%.
>
> We brainstormed a number of possibilities here:
>
> https://etherpad.wikimedia.org/p/LanguageSupportBrainstorming
>
>
> We now need to decide which of these ideas we should prioritize. We might
> want to take into consideration which of these can be pre-tested with our
> relevancy lab work, such that we can prefer to work on things we think will
> move the needle the most. I'm really not sure which of these to push forward
> on, so let us know which you think can have the most impact, or where the
> expected impact could be measured with relevancy lab with minimal work.
>
>
>

> _______________________________________________
> discovery mailing list
> discovery@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/discovery
>

--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery