Maybe it was there as nuance, however, I was trying to say that ***raw*** pageview numbers themself should not be the factor (whatever % of the total that you apply), though some calculation based on pageview with other factors, eg. an order of magnitude of the pageview so all that range of pages has a smoothing factor.
If you are saying that the pageview is approximately a quarter, that seems to be a very large number based on two letters typed "po..." has many combinations and that pornhub comes up early due to pageview factor is ... ummm... thought provoking. I would think that 1/4 of searches for "po..." are not for pornhub, though I am not aware that such data is available.
Regards, Billinghurst
On Tue, Jan 26, 2016 at 6:30 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Mon, Jan 25, 2016 at 11:16 PM, billinghurst billinghurstwiki@gmail.com wrote:
For the purpose of this exercise I think that it is completely reasonable for staff/developers to play with the factors and make sure that we are not having offence caused through this development. We want the focus to be on the tool, and what it can do; not start a bunfight and detract from the goal.
For full production, I do NOT think that it is reasonable that either staff or developers make the determination of what is or what is not offensive, and whether a term should or should not be displayed. That determination sits clearly with the community, and is part of a discussion when the tool approaches full production and given to the community. It is part of what the community can or will need to do.
All that said, page views as a raw number should not be the determinator of a suggestion. I will add fuller comment to the phabricator ticket.
They arn't, and i hope noone was led to believe this was ever the intention. Page views is a factor. Currently the number of incoming wikilinks, outgoing wikilinks, external links, redirects, headings and the size of the article all have different weights. Page views is being added as another factor, the current WIP patch uses page views as ~23% of the final score (if my math is right).
Regards, Billinghurst
On Tue, Jan 26, 2016 at 9:37 AM, Dan Garry dgarry@wikimedia.org wrote:
Hey David,
Thanks for starting this discussion!
On 22 January 2016 at 13:53, David Causse dcausse@wikimedia.org wrote:
http://en-suggesty.wmflabs.org/suggest.html is updated with a score that integrates pageviews.
Pageviews solve most of the problems we encountered in the previous formula unfortunately we now see some porn related suggestions.
- x will suggest xxx
- po will suggest pornhub just below poland in 2nd position. And is
ranked #6 for the query 'p'
As of right now, neither of these queries do this any more. "x" now suggests "Xinjiang" as the top result, and "po" now suggests "Pope Francis" after "Poland"... which may or may not be more palatable than Pornhub, depending on your viewpoints and ideals! Generally, Wikipedians like to point out that Wikipedia is not censored. That said, it's still worth considering whether this is appropriate or not. I personally don't have much of a problem with the fact that certain search results might be a little offensive... but I do think that they're probably also not really that useful.
Given how volatile this has made our search results, my sense is that we're giving too much weight to how much we're letting page view data affect the ranking. Is it as simple as tweaking a coefficient so that page views are still taken into consideration but with lower weight, or do we need to do something more involved? I created T124722 to track this work, and added it our list of blockers for a wider rollout of the suggester.
Thanks!
Dan
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery