Re: [WikimediaMobile] Similar articles feature performance in CirrusSearch for apps and mobile web

19 Feb 2016

Thanks both!  This clarifies a lot. I think the primary issue that editors
had raised and I had hoped to explore was popularity/importance v.
obscurity.

Specifically, there have been concerns that the results tilt towards more
popular articles (here
<https://www.mediawiki.org/wiki/Topic:Swjyfj59pkjfol7m> and here
<https://www.mediawiki.org/wiki/Topic:Sxy84nxinxqqld2i>), but it seems that
page traffic is not a variable.  Instead, what seems to be happening is
that the raw # of similar terms is being used, rather than the % of similar
terms.  This means that longer articles are favored.  Is that a fair
assessment?

-J

On Thu, Feb 18, 2016 at 4:15 PM, Gergo Tisza &lt;gtisza(a)wikimedia.org&gt; wrote:

...
  On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz
&lt;jkatz(a)wikimedia.org&gt; wrote:

  Can someone on this list point me to where the
more-like code sits? Or
 better, yet would be someone documenting the rules that govern
 prioritization of suggestions.

 I would like to document the logic for our communities so that we can
 have an open discussion about what variables and weighting we should use to
 suggest articles.

 "More like" is an Elasticsearch
 <https://en.wikipedia.org/wiki/Elasticsearch> feature; the
 documentation is here

<https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html>.
 I'd imagine the source code is way too complicated to give much insight to
 the casual reader (as Elasticsearch is a large and complex piece of
 software) but I never looked into the ES codebase so that's just a guess.
 The configuration we use for morelike queries is here

<https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450>.
 The wrapper code that fires the ES query is here

<https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800>
(but
 at a glance it doesn't do anything interesting).

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [WikimediaMobile] Similar articles feature performance in CirrusSearch for apps and mobile web