Next steps for language goal - Discovery

3 Nov 2015

In terms of user language data we have,  within the webrequests
table in hive we have the accept language header and we have
geolocation information. This table also contains the query
strings so we can extract the exact search terms and feed that
information into relevancy lab.

On Tue, Nov 3, 2015 at 3:29 PM, Kevin Smith <ksmith@wikimedia.org
<mailto:ksmith@wikimedia.org>> wrote:

    So do we think we should favor the "try to guess the user's
    language(s)" item over others that would benefit from the
    relevance lab? Are there steps we could/should take in
    advance, such as analyzing whatever user language data we
    have, or instrumenting to get more if we don't have enough?

    Kevin Smith
    Agile Coach, Wikimedia Foundation
    /
    /

    On Tue, Nov 3, 2015 at 2:25 PM, Trey Jones
    <tjones@wikimedia.org <mailto:tjones@wikimedia.org>> wrote:

        Sorry I didn't respond to this sooner!

        I really like the idea of trying to detect what languages
        the user can read, and searching in (a subset of) those.
        This wouldn't benefit from relevance lab testing, though.
        It'll need to be measured against the user satisfaction
        metric. (BTW, Do we have a sense of how many users have
        info we can detect for this?)

        I think the biggest problem with language detection is the
        quality of the language detector. The Elastic Search
        plugin we tested has a Romanian fetish when run on our
        queries (Erik got about 38% Romanian on 100K enwiki
        searches, which is crazy, and I got 0% accuracy for
        Romanian on my much smaller tagged corpus of failed (zero
        results) queries to enwiki). Most of the time, I would
        expect queries sent to the wrong wiki to fail (though
        there are some exceptions)—but a query in English that
        does get hits in rowiki is going to just look wrong most
        of the time.

        There are several proposals for improving language
        detection in the etherpad, and we can work on them in
        parallel, since any given one could be better than any
        other one. (We don't want to make 100 of them, but a few
        to test and compare would be nice—there may also be
        reasonable speed/accuracy tradeoffs to be made, e.g., 2%
        decrease in accuracy for 2x speed is a good deal.)

        We need training and evaluation data. I see a few ways of
        getting it. The easy, lower-quality way is just take
        queries from a given wiki and assume they are in the
        language in question (i.e., eswiki queries are in
        Spanish). Easy, not 100% accurate, unlimited supply. The
        hard, higher-quality way is to hand annotate a corpus of
        queries. This is slow, but doable. I can do on the order
        of 1000 queries in a day—more if I were less accurate and
        more willing to toss stuff into the junk pile. I couldn't
        do it for a week straight, though, without going crazy. A
        possible middle of the road approach would be to create a
        feedback loop and run detectors on our training data and
        review and remove items that are not in the desired
        language (we could also start by filtering things that are
        not in the right character set, like removing all Arabic,
        Cyrillic, and Chinese from enwiki, frwiki, and eswiki
        queries). If we want thousands of hand-annotated queries,
        we need to get annotating!

        I think we can use the relevance lab to help evaluate a
        language detector (at least with respect to zero results
        rate). We could run the detector against a pile of
        zero-results queries, then group the queries by detected
        language, and run them against the relevant wiki (if we
        have room in labs for the indexes, and we update the
        relevance lab tools to support choosing a target wiki to
        search). We wouldn't be comparing "before" and "after",
        but just measuring the zero results rate against the
        target wiki. As any time we're using zero-results rate,
        there's no guarantee that we'll be giving good results,
        just results (e.g., "unix time stamp" queries with English
        words fail on enwiki but sometimes work on zhwiki for some
        reason, but that's not really better.)

        I'm somewhat worried about being able to reduce the
        targeted zero results rate by 10%. In my test[1], only 12%
        of non-DOI zero-results queries were "in a language", and
        only about a third got results when searched in the
        "correct" (human-determined) wiki. I didn't filter bots
        other than the DOI bot, and some non-language queries
        (e.g., names) might get results in another wiki, but there
        may not be enough wiggle room. There's a lot of junk in
        other languages, too, but maybe filtering bots will help
        more than I dare presume.

        [1]
        https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_Searching#Perfect_identification.2C_ignoring_non-language_queries
        <https://www.mediawiki.org/wiki/User:TJones_%28WMF%29/Notes/Cross_Language_Wiki_Searching#Perfect_identification.2C_ignoring_non-language_queries>

        Trey Jones
        Software Engineer, Discovery
        Wikimedia Foundation

        On Mon, Nov 2, 2015 at 9:03 PM, Erik Bernhardson
        <ebernhardson@wikimedia.org
        <mailto:ebernhardson@wikimedia.org>> wrote:

            It measures the zero results rate for 1 in 10 search
            requests via CirrusSearchUserTesting log that we used
            last quarter.

            On Mon, Nov 2, 2015 at 6:01 PM, Oliver Keyes
            <okeyes@wikimedia.org <mailto:okeyes@wikimedia.org>>
            wrote:

                Define this "does it do anything?" test?

                On 2 November 2015 at 19:58, Erik Bernhardson
                <ebernhardson@wikimedia.org
                <mailto:ebernhardson@wikimedia.org>> wrote:
                > Now that we have the feature deployed (behind a
                feature flag), and have an
                > initial "does it do anything?" test going out
                today, along with an upcoming
                > integration with our satisfaction metrics, we
                need to come up with how will
                > will try to further move the needle forward.
                >
                > For reference these are our Q2 goals:
                >
                > Run A/B test for a feature that:
                >
                > Uses a library to detect the language of a
                user's search query.
                > Adjusts results to match that language.
                >
                > Determine from A/B test results whether this
                feature is fit to push to
                > production, with the aim to:
                >
                > Improve search user satisfaction by 10% (from
                15% to 16.5%).
                > Reduce zero results rate for non-automata search
                queries by 10%.
                >
                > We brainstormed a number of possibilities here:
                >
                >
                https://etherpad.wikimedia.org/p/LanguageSupportBrainstorming
                >
                >
                > We now need to decide which of these ideas we
                should prioritize. We might
                > want to take into consideration which of these
                can be pre-tested with our
                > relevancy lab work, such that we can prefer to
                work on things we think will
                > move the needle the most. I'm really not sure
                which of these to push forward
                > on, so let us know which you think can have the
                most impact, or where the
                > expected impact could be measured with relevancy
                lab with minimal work.
                >
                >
                >
                > _______________________________________________
                > discovery mailing list
                > discovery@lists.wikimedia.org
                <mailto:discovery@lists.wikimedia.org>
                >
                https://lists.wikimedia.org/mailman/listinfo/discovery
                >

                --
                Oliver Keyes
                Count Logula
                Wikimedia Foundation

                _______________________________________________
                discovery mailing list
                discovery@lists.wikimedia.org
                <mailto:discovery@lists.wikimedia.org>
                https://lists.wikimedia.org/mailman/listinfo/discovery

            _______________________________________________
            discovery mailing list
            discovery@lists.wikimedia.org
            <mailto:discovery@lists.wikimedia.org>
            https://lists.wikimedia.org/mailman/listinfo/discovery

        _______________________________________________
        discovery mailing list
        discovery@lists.wikimedia.org
        <mailto:discovery@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/discovery

    _______________________________________________
    discovery mailing list
    discovery@lists.wikimedia.org
    <mailto:discovery@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/discovery