Hello,
happy to join the discussion.
I also think that a search by phonetic is a really good improvement,
currently many times you search on google and then copy paste.
I am also experimenting with elastic search, and thanks to this thread I
discovered wikipedia is also using it with CirrusSearch; could search by
applied only to *links names* (no text) of currently not phonetically
supported languages, and then map results on ES?
e.g. for chinese
Maybe also ES has their own support ?
On Tue, Jan 26, 2016 at 8:30 AM, Erik Bernhardson <
ebernhardson(a)wikimedia.org> wrote:
On Mon, Jan 25, 2016 at 11:16 PM, billinghurst
<billinghurstwiki(a)gmail.com
wrote:
For the purpose of this exercise I think that it
is completely
reasonable for staff/developers to play with the factors and make sure
that we are not having offence caused through this development. We
want the focus to be on the tool, and what it can do; not start a
bunfight and detract from the goal.
For full production, I do NOT think that it is reasonable that either
staff or developers make the determination of what is or what is not
offensive, and whether a term should or should not be displayed. That
determination sits clearly with the community, and is part of a
discussion when the tool approaches full production and given to the
community. It is part of what the community can or will need to do.
All that said, page views as a raw number should not be the
determinator of a suggestion. I will add fuller comment to the
phabricator ticket.
They arn't, and i hope noone was led to believe this was ever the
intention. Page views is a factor. Currently the number of incoming
wikilinks, outgoing wikilinks, external links, redirects, headings and the
size of the article all have different weights. Page views is being added
as another factor, the current WIP patch uses page views as ~23% of the
final score (if my math is right).
Regards, Billinghurst
On Tue, Jan 26, 2016 at 9:37 AM, Dan Garry <dgarry(a)wikimedia.org> wrote:
Hey David,
Thanks for starting this discussion!
On 22 January 2016 at 13:53, David Causse <dcausse(a)wikimedia.org>
wrote:
that
> integrates pageviews.
>
> Pageviews solve most of the problems we encountered in the previous
> formula unfortunately we now see some porn related suggestions.
> - x will suggest xxx
> - po will suggest pornhub just below poland in 2nd position. And is
ranked
#6 for
the query 'p'
As of right now, neither of these queries do this any more. "x" now
suggests
"Xinjiang" as the top result, and
"po" now suggests "Pope Francis" after
"Poland"... which may or may not be more palatable than Pornhub,
depending
on your viewpoints and ideals! Generally,
Wikipedians like to point out
that
Wikipedia is not censored. That said, it's
still worth considering
whether
this is appropriate or not. I personally
don't have much of a problem
with
the fact that certain search results might be a
little offensive... but
I do
think that they're probably also not really
that useful.
Given how volatile this has made our search results, my sense is that
we're
giving too much weight to how much we're
letting page view data affect
the
ranking. Is it as simple as tweaking a
coefficient so that page views
are
still taken into consideration but with lower
weight, or do we need to
do
something more involved? I created T124722 to
track this work, and
added it
our list of blockers for a wider rollout of the
suggester.
Thanks!
Dan
--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery