Re: [WikimediaMobile] Similar articles feature performance in CirrusSearch for apps and mobile web

20 Jan 2016

One thing we could do regarding the quality of the output is check results
against a random sample of popular articles (example approach
<https://phabricator.wikimedia.org/T120504#1900287> to find some articles)
on mdot Wikipedia. Presuming that improves the quality of the
recommendations or at least does not degrade them, we should consider
adding the enhancement task to a future sprint, with further
instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for
now for further reduction of client perceived wait, at least for non-cold
cache requests, even if we stop beating up the backend. Does anyone know of
a compelling reason to not do that for the time being? The main thing that
comes to mind as always is growing the Varnish cache object pool - probably
not a huge deal while the thing is only in beta, but on the stable channel
maybe noteworthy because it would run on probably most pages (but that's
what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend
sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so
we get more cache hits at the edge across all clients. Here's the URL I see
being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&forma…

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez <
jhernandez(a)wikimedia.org&gt; wrote:

...
  I'd be up to it if we manage to cram it up in a
following sprint and it is
 worth it.

 We could run a controlled test against production with a long batch of
 articles and check median/percentiles response time with repeated runs and
 highlight the different results for human inspection regarding quality.

 It's been noted previously that the results are far from ideal (which they
 are because it is just *morelike*), and I think it would be a great idea
 to change the endpoint to a specific one that is smarter and has some cache
 (we could do much more to get relevant results besides text similarity,
 take into account links, or *see also* links if there are, etc...).

 As a note, in mobile web the related articles extension allows editors to
 specify articles to show in the section, which would avoid queries to
 cirrussearch if it was more used (once rolled into stable I guess).

 I remember that the performance related task was closed as resolved (
 https://phabricator.wikimedia.org/T121254#1907192), should we reopen it
 or create a new one?

 I'm not sure if we ended up adding the smaxage parameter (I think we
 didn't

<https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/search?utf8=%E2%9C%93&q=maxage&type=Code>),
 should we? To me it seems a no-brainer that we should be caching this
 results in varnish since they don't need to be completely up to date for
 this use case.

 On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson <
 ebernhardson(a)wikimedia.org&gt; wrote:

  Both mobile apps and web are using
CirrusSearch's morelike: feature which
 is showing some performance issues on our end. We would like to make a
 performance optimization to it, but before we would prefer to run an A/B
 test to see if the results are still "about as good" as they are currently.

 The optimization is basically: Currently more like this takes the entire
 article into account, we would like to change this to take only the opening
 text of an article into account. This should reduce the amount of work we
 have to do on the backend saving both server load and latency the user sees
 running the query.

 This can be triggered by adding these two query parameters to the search
 api request that is being performed:

 cirrusMltUseFields=yes&cirrusMltFields=opening_text

 The API will give a warning that these parameters do not exist, but they
 are safe to ignore. Would any of you be willing to run this test? We would
 basically want to look at user perceived latency along with click through
 rates for the current default setup along with the restricted setup using
 only opening_text.

 Erik B.

 _______________________________________________
 Mobile-l mailing list
 Mobile-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [WikimediaMobile] Similar articles feature performance in CirrusSearch for apps and mobile web