Learning to Rank at Slack

List overview All Threads
Download

newer

older

Discovery Weekly Update for the...

Permission to access...

Erik Bernhardson

8 Feb 2017 8 Feb '17

10:01 a.m.

Might be interesting to look over

https://slack.engineering/search-at-slack-431f8c80619e

Attachments:

attachment.htm (text/html — 208 bytes)

Show replies by date

Wes Moran

8 Feb 8 Feb

10:08 a.m.

"On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!"

Interesting way to look at it. Also interesting takes on recent v relevant. Thanks for sharing.

On Wed, Feb 8, 2017 at 1:01 PM, Erik Bernhardson <ebernhardson@wikimedia.org

...

wrote:

...

Might be interesting to look over

https://slack.engineering/search-at-slack-431f8c80619e

discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

Erik Bernhardson

10:31 a.m.

The short summary of the implementation they used seems to be based on a very influential paper from 2002, Optimizing Search Engines with Clickthrough Data http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf. They do a pairwise transform on clickstream data and feed it into an SVM. Interestingly this model is not widely used in industry, the paper was influential but the exact methods have fallen out of favor for ensembles of decision tree's along with explicit labels rather than pairwise click data. Their reasoning behind this seems reasonable though, their content is highly silo'd and it's quite rare to have queries against the same content be repeated any reasonable number of time that would allow for labeling via click models or humans.

On Wed, Feb 8, 2017 at 10:08 AM, Wes Moran wmoran@wikimedia.org wrote:

...

"On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!"

Interesting way to look at it. Also interesting takes on recent v relevant. Thanks for sharing.

On Wed, Feb 8, 2017 at 1:01 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:

...
Might be interesting to look over

https://slack.engineering/search-at-slack-431f8c80619e

discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

Trey Jones

10:43 a.m.

Interesting stuff.

I think they somewhat misused that 20% statistic, though. I followed that link and it seems like the problem it refers to is tracking down organization-internal info that isn't in a searchable format (hence the suggestion of org-internal social media). People need wikis. ; )

There's also this:

...

Although the total size of the text corpus is large, each team’s corpus is relatively small and thus allows us to devote more computational resources to each message during ranking.

That sounds kinda like "might not scale to big Wikipedias", alas. Bummer. They also use a lot of features that aren't applicable to us, unfortunately. But the relevance vs recency thing is indeed thought-provoking.

Thanks for sharing, Erik—much appreciated! —Trey

Trey Jones Software Engineer, Discovery Wikimedia Foundation

On Wed, Feb 8, 2017 at 1:08 PM, Wes Moran wmoran@wikimedia.org wrote:

...

"On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!"

Interesting way to look at it. Also interesting takes on recent v relevant. Thanks for sharing.

On Wed, Feb 8, 2017 at 1:01 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:

...
Might be interesting to look over

https://slack.engineering/search-at-slack-431f8c80619e

discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

2721

Age (days ago)

2721

Last active (days ago)

discovery@lists.wikimedia.org

3 comments

3 participants

tags (0)

participants (3)

Erik Bernhardson
Trey Jones
Wes Moran