Might be interesting to look over
"On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!"
Interesting way to look at it. Also interesting takes on recent v relevant. Thanks for sharing.
On Wed, Feb 8, 2017 at 1:01 PM, Erik Bernhardson <ebernhardson@wikimedia.org
wrote:
Might be interesting to look over
https://slack.engineering/search-at-slack-431f8c80619e
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
The short summary of the implementation they used seems to be based on a very influential paper from 2002, Optimizing Search Engines with Clickthrough Data http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf. They do a pairwise transform on clickstream data and feed it into an SVM. Interestingly this model is not widely used in industry, the paper was influential but the exact methods have fallen out of favor for ensembles of decision tree's along with explicit labels rather than pairwise click data. Their reasoning behind this seems reasonable though, their content is highly silo'd and it's quite rare to have queries against the same content be repeated any reasonable number of time that would allow for labeling via click models or humans.
On Wed, Feb 8, 2017 at 10:08 AM, Wes Moran wmoran@wikimedia.org wrote:
"On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!"
Interesting way to look at it. Also interesting takes on recent v relevant. Thanks for sharing.
On Wed, Feb 8, 2017 at 1:01 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
Might be interesting to look over
https://slack.engineering/search-at-slack-431f8c80619e
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Interesting stuff.
I think they somewhat misused that 20% statistic, though. I followed that link and it seems like the problem it refers to is tracking down organization-internal info that isn't in a searchable format (hence the suggestion of org-internal social media). People need wikis. ; )
There's also this:
Although the total size of the text corpus is large, each team’s corpus is relatively small and thus allows us to devote more computational resources to each message during ranking.
That sounds kinda like "might not scale to big Wikipedias", alas. Bummer. They also use a lot of features that aren't applicable to us, unfortunately. But the relevance vs recency thing is indeed thought-provoking.
Thanks for sharing, Erik—much appreciated! —Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Wed, Feb 8, 2017 at 1:08 PM, Wes Moran wmoran@wikimedia.org wrote:
"On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!"
Interesting way to look at it. Also interesting takes on recent v relevant. Thanks for sharing.
On Wed, Feb 8, 2017 at 1:01 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
Might be interesting to look over
https://slack.engineering/search-at-slack-431f8c80619e
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery