Hey David,

Thanks for starting this discussion!

On 22 January 2016 at 13:53, David Causse <dcausse@wikimedia.org> wrote:
http://en-suggesty.wmflabs.org/suggest.html is updated with a score that integrates pageviews.

Pageviews solve most of the problems we encountered in the previous formula unfortunately we now see some porn related suggestions.
- x will suggest xxx
- po will suggest pornhub just below poland in 2nd position. And is ranked #6 for the query 'p'

As of right now, neither of these queries do this any more. "x" now suggests "Xinjiang" as the top result, and "po" now suggests "Pope Francis" after "Poland"... which may or may not be more palatable than Pornhub, depending on your viewpoints and ideals! Generally, Wikipedians like to point out that Wikipedia is not censored. That said, it's still worth considering whether this is appropriate or not. I personally don't have much of a problem with the fact that certain search results might be a little offensive... but I do think that they're probably also not really that useful.

Given how volatile this has made our search results, my sense is that we're giving too much weight to how much we're letting page view data affect the ranking. Is it as simple as tweaking a coefficient so that page views are still taken into consideration but with lower weight, or do we need to do something more involved? I created T124722 to track this work, and added it our list of blockers for a wider rollout of the suggester.

Thanks!

Dan

--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation