I'm only a week late to the party—and it's Friday the 13th so anything
goes, right?
Erik wrote:
I would prefer to see us focus on search relevance and
improving the
scoring of what we already have before spending more focus on interwiki
search.
David wrote:
Working without a relevancy lab will always lead to
discrepancies like
that, the developer will focus on a limited set of 4/5 queries to develop
the feature with a high risk to break previous features. I'd really like
to use the relevancy lab to review existing features.
David's uncovered a number of weird results with the standard search config
(as have others), and while I love to say, "the plural of anecdote is not
data", but that's David's point, we need to assess performance overall, not
just on the motivational examples.
The relevance lab will let us test a lot of options quickly and relatively
cheaply. I've been thinking about it in my 10% time and I've got a line on
how to handle annotations (including "required result") that even in the
absence of a proper gold standard corpus makes it feasible to collect
examples like David's and use them as "search quality unit tests" to make
sure we don't break things.
The cross-language cross-wiki task is endlessly fascinating (to a language
nerd like me), but I worry that the maximum potential impact is low, and
that success is very hard to measure, because the plausible use cases are
so complex (esp. right now—I want inline surveys, dagnabbit!). I think the
same may be true of other cross-wiki searching.
I think this also fits with the overall Discovery vision of first looking
inward and making sure our fundamentals are sound.
Can we talk more about the theory and practice of updating our Q3 goals in
our next weekly meeting?
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Fri, Nov 6, 2015 at 10:45 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
+1 on reviewing existing features. That it is standard
does not mean
that it works, and it's nice to be able to pass results back upstream.
On 6 November 2015 at 03:41, David Causse <dcausse(a)wikimedia.org> wrote:
Le 05/11/2015 22:56, Erik Bernhardson a écrit :
I really want to see us focus on fixing what we already have and
validating the features we already support before we go whole hog on
incorperating all kinds of new data.
Hi,
I totally agree, there's some existing features that need to be reviewed,
tuned or rewritten. Some queries give better results if disabled:
- kennedy[1] with default features enable does not bring JFK in the first
page
- kennedy[2] with some features disabled (all fields, boost links) brings
JFK in the top 3
Working without a relevancy lab will always lead to discrepancies like
that,
the developer will focus on a limited set of 4/5
queries to develop the
feature with a high risk to break previous features.
I'd really like to use the relevancy lab to review existing features.
[1]
https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=def…
[2]
https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=def…
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery
--
Oliver Keyes
Count Logula
Wikimedia Foundation
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery