Similar articles feature performance in CirrusSearch for apps and mobile web

List overview All Threads
Download

newer

older

Re: [WikimediaMobile]...

[RELEASE] Kiwix for Android 1.97

Erik Bernhardson

20 Jan 2016 20 Jan '16

12:54 a.m.

Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

Attachments:

attachment.htm (text/html — 1.3 KB)

Show replies by date

Joaquin Oltra Hernandez

20 Jan 20 Jan

5:45 p.m.

I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just *morelike*), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or *see also* links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved ( https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/search?utf8=%E2%9C%93&q=maxage&type=Code), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:

...

Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Adam Baso

7:47 p.m.

One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach https://phabricator.wikimedia.org/T120504#1900287 to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:

...

I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just *morelike*), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or *see also* links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved ( https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/search?utf8=%E2%9C%93&q=maxage&type=Code), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:

...
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Jon Robson

8:34 p.m.

I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)

Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?

[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya

On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:

...

One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

...
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:

...
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

David Causse

10:34 p.m.

Hi,

Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :

$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",

$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",

$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",

Le 20/01/2016 19:34, Jon Robson a écrit :

...

I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)

Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?

[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya

On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:

...
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

...
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:

...
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Joaquin Oltra Hernandez

21 Jan 21 Jan

11:29 a.m.

Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.

It is an extremely adhoc and brittle solution but seems like it would be the greatest win.

20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.

Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.

Regarding the validity of results with opening text only, how should we proceed? Adam?

On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:

...

Hi,

Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :

$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",

$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",

$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",

Le 20/01/2016 19:34, Jon Robson a écrit :

...
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)

Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?

[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya

On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:

...
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

...
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:

...
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Erik Bernhardson

22 Jan 22 Jan

8:49 p.m.

On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:

...

Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.

It is an extremely adhoc and brittle solution but seems like it would be the greatest win.

20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.

To clarify its 20% of the load, as opposed to 20% of the traffic. But same

difference :)

...

Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.

Regarding the validity of results with opening text only, how should we proceed? Adam?

I've put together https://phabricator.wikimedia.org/T124258 to track

putting together an AB test that measures the difference in click through rates for the two approaches.

...

On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:

...
Hi,

Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :

$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",

$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",

$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",

Le 20/01/2016 19:34, Jon Robson a écrit :

...
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)

Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?

[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya

On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:

...
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

...
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:

...
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Adam Baso

23 Jan 23 Jan

3:58 p.m.

Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your replies!

On Friday, January 22, 2016, Erik Bernhardson ebernhardson@wikimedia.org wrote:

...

On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org javascript:_e(%7B%7D,'cvml','jhernandez@wikimedia.org');> wrote:

...
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.

It is an extremely adhoc and brittle solution but seems like it would be the greatest win.

20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.

To clarify its 20% of the load, as opposed to 20% of the traffic. But

same difference :)

...
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.

Regarding the validity of results with opening text only, how should we proceed? Adam?

I've put together https://phabricator.wikimedia.org/T124258 to track

putting together an AB test that measures the difference in click through rates for the two approaches.

...
On Wed, Jan 20, 2016 at 9:34 PM, David Causse <dcausse@wikimedia.org javascript:_e(%7B%7D,'cvml','dcausse@wikimedia.org');> wrote:

...
Hi,

Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :

$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",

$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",

$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",

Le 20/01/2016 19:34, Jon Robson a écrit :

...
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)

Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?

[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya

On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso <abaso@wikimedia.org javascript:_e(%7B%7D,'cvml','abaso@wikimedia.org');> wrote:

...
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez <jhernandez@wikimedia.org javascript:_e(%7B%7D,'cvml','jhernandez@wikimedia.org');> wrote:

...
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson <ebernhardson@wikimedia.org javascript:_e(%7B%7D,'cvml','ebernhardson@wikimedia.org');> wrote:

> Both mobile apps and web are using CirrusSearch's morelike: feature > which > is showing some performance issues on our end. We would like to make > a > performance optimization to it, but before we would prefer to run an > A/B > test to see if the results are still "about as good" as they are > currently. > > The optimization is basically: Currently more like this takes the > entire > article into account, we would like to change this to take only the > opening > text of an article into account. This should reduce the amount of > work we > have to do on the backend saving both server load and latency the > user sees > running the query. > > This can be triggered by adding these two query parameters to the > search > api request that is being performed: > > cirrusMltUseFields=yes&cirrusMltFields=opening_text > > > The API will give a warning that these parameters do not exist, but > they > are safe to ignore. Would any of you be willing to run this test? We > would > basically want to look at user perceived latency along with click > through > rates for the current default setup along with the restricted setup > using > only opening_text. > > Erik B. > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >

Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l

Adam Baso

30 Jan 30 Jan

6:11 p.m.

Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think if we're doing near term experimentation with a controlled A/B test the Android app is the only logical place to start. Dmitry, can that work for you? It's not required, but I think it would be neat to see if we can move the needle even more. Of course your quarterly goals take top priority...but what do you think?

On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:

...

Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your replies!

On Friday, January 22, 2016, Erik Bernhardson ebernhardson@wikimedia.org wrote:

...
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:

...
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.

It is an extremely adhoc and brittle solution but seems like it would be the greatest win.

20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.

To clarify its 20% of the load, as opposed to 20% of the traffic. But

same difference :)

...
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.

Regarding the validity of results with opening text only, how should we proceed? Adam?

I've put together https://phabricator.wikimedia.org/T124258 to track

putting together an AB test that measures the difference in click through rates for the two approaches.

...
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:

...
Hi,

Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :

$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",

$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",

$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",

Le 20/01/2016 19:34, Jon Robson a écrit :

...
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)

Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?

[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya

On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:

...
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.

Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).

Erik, from your perspective does use of smaxage relieve the backend sufficiently?

If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

-Adam

On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

> I'd be up to it if we manage to cram it up in a following sprint and > it is > worth it. > > We could run a controlled test against production with a long batch > of > articles and check median/percentiles response time with repeated > runs and > highlight the different results for human inspection regarding > quality. > > It's been noted previously that the results are far from ideal > (which they > are because it is just morelike), and I think it would be a great > idea to > change the endpoint to a specific one that is smarter and has some > cache (we > could do much more to get relevant results besides text similarity, > take > into account links, or see also links if there are, etc...). > > As a note, in mobile web the related articles extension allows > editors to > specify articles to show in the section, which would avoid queries to > cirrussearch if it was more used (once rolled into stable I guess). > > I remember that the performance related task was closed as resolved > (https://phabricator.wikimedia.org/T121254#1907192), should we > reopen it or > create a new one? > > I'm not sure if we ended up adding the smaxage parameter (I think we > didn't), should we? To me it seems a no-brainer that we should be > caching > this results in varnish since they don't need to be completely up to > date > for this use case. > > On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson > ebernhardson@wikimedia.org wrote: > >> Both mobile apps and web are using CirrusSearch's morelike: feature >> which >> is showing some performance issues on our end. We would like to >> make a >> performance optimization to it, but before we would prefer to run >> an A/B >> test to see if the results are still "about as good" as they are >> currently. >> >> The optimization is basically: Currently more like this takes the >> entire >> article into account, we would like to change this to take only the >> opening >> text of an article into account. This should reduce the amount of >> work we >> have to do on the backend saving both server load and latency the >> user sees >> running the query. >> >> This can be triggered by adding these two query parameters to the >> search >> api request that is being performed: >> >> cirrusMltUseFields=yes&cirrusMltFields=opening_text >> >> >> The API will give a warning that these parameters do not exist, but >> they >> are safe to ignore. Would any of you be willing to run this test? >> We would >> basically want to look at user perceived latency along with click >> through >> rates for the current default setup along with the restricted setup >> using >> only opening_text. >> >> Erik B. >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Gabriel Wicke

8:02 p.m.

We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.

On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:

...

Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think if we're doing near term experimentation with a controlled A/B test the Android app is the only logical place to start. Dmitry, can that work for you? It's not required, but I think it would be neat to see if we can move the needle even more. Of course your quarterly goals take top priority...but what do you think?

On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:

...
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your replies!

On Friday, January 22, 2016, Erik Bernhardson ebernhardson@wikimedia.org wrote:

...
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

...
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are exactly the same to not bloat varnish and reuse the same cached objects across platforms.

It is an extremely adhoc and brittle solution but seems like it would be the greatest win.

20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.

To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)

...
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.

Regarding the validity of results with opening text only, how should we proceed? Adam?

I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click through rates for the two approaches.

...
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:

...
Hi,

Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :

$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",

$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",

$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",

Le 20/01/2016 19:34, Jon Robson a écrit :

...
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)

Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?

[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya

On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote: > > One thing we could do regarding the quality of the output is check > results > against a random sample of popular articles (example approach to find > some > articles) on mdot Wikipedia. Presuming that improves the quality of > the > recommendations or at least does not degrade them, we should consider > adding > the enhancement task to a future sprint, with further instrumentation > and > A/B testing / timeboxed beta test, etc. > > Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good > fix for > now for further reduction of client perceived wait, at least for > non-cold > cache requests, even if we stop beating up the backend. Does anyone > know of > a compelling reason to not do that for the time being? The main thing > that > comes to mind as always is growing the Varnish cache object pool - > probably > not a huge deal while the thing is only in beta, but on the stable > channel > maybe noteworthy because it would run on probably most pages (but > that's > what edge caches are for, after all). > > Erik, from your perspective does use of smaxage relieve the backend > sufficiently? > > If we do smaxage, then Web, Android, iOS should standardize their > URLs so we > get more cache hits at the edge across all clients. Here's the URL I > see > being used on the web today from mobile web beta: > > > https://en.m.wikipedia.org/w/api.php?action=query&format=json&format... > > > -Adam > > On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez > jhernandez@wikimedia.org wrote: >> >> I'd be up to it if we manage to cram it up in a following sprint and >> it is >> worth it. >> >> We could run a controlled test against production with a long batch >> of >> articles and check median/percentiles response time with repeated >> runs and >> highlight the different results for human inspection regarding >> quality. >> >> It's been noted previously that the results are far from ideal >> (which they >> are because it is just morelike), and I think it would be a great >> idea to >> change the endpoint to a specific one that is smarter and has some >> cache (we >> could do much more to get relevant results besides text similarity, >> take >> into account links, or see also links if there are, etc...). >> >> As a note, in mobile web the related articles extension allows >> editors to >> specify articles to show in the section, which would avoid queries >> to >> cirrussearch if it was more used (once rolled into stable I guess). >> >> I remember that the performance related task was closed as resolved >> (https://phabricator.wikimedia.org/T121254#1907192), should we >> reopen it or >> create a new one? >> >> I'm not sure if we ended up adding the smaxage parameter (I think we >> didn't), should we? To me it seems a no-brainer that we should be >> caching >> this results in varnish since they don't need to be completely up to >> date >> for this use case. >> >> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >> ebernhardson@wikimedia.org wrote: >>> >>> Both mobile apps and web are using CirrusSearch's morelike: feature >>> which >>> is showing some performance issues on our end. We would like to >>> make a >>> performance optimization to it, but before we would prefer to run >>> an A/B >>> test to see if the results are still "about as good" as they are >>> currently. >>> >>> The optimization is basically: Currently more like this takes the >>> entire >>> article into account, we would like to change this to take only the >>> opening >>> text of an article into account. This should reduce the amount of >>> work we >>> have to do on the backend saving both server load and latency the >>> user sees >>> running the query. >>> >>> This can be triggered by adding these two query parameters to the >>> search >>> api request that is being performed: >>> >>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>> >>> >>> The API will give a warning that these parameters do not exist, but >>> they >>> are safe to ignore. Would any of you be willing to run this test? >>> We would >>> basically want to look at user perceived latency along with click >>> through >>> rates for the current default setup along with the restricted setup >>> using >>> only opening_text. >>> >>> Erik B. >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Gabriel Wicke Principal Engineer, Wikimedia Foundation

Dmitry Brant

1 Feb 1 Feb

4:35 a.m.

Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.

On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:

...

We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.

On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:

...
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think

if

...
we're doing near term experimentation with a controlled A/B test the

Android

...
app is the only logical place to start. Dmitry, can that work for you?

It's

...
not required, but I think it would be neat to see if we can move the

needle

...
even more. Of course your quarterly goals take top priority...but what do you think?

On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:

...
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your

replies!

...
...
On Friday, January 22, 2016, Erik Bernhardson <

ebernhardson@wikimedia.org>

...
...
wrote:

...
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

...
Regarding the caching, we would need to agree between apps and web

about

...
...
...
...
the url and smaxage parameter as Adam noted so that the urls are

exactly the

...
...
...
...
same to not bloat varnish and reuse the same cached objects across platforms.

It is an extremely adhoc and brittle solution but seems like it would

be

...
...
...
...
the greatest win.

20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when

it hits

...
...
...
...
web stable we're going to crush the servers, so caching seems the

highest

...
...
...
...
priority.

To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)

...
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.

Regarding the validity of results with opening text only, how should

we

...
...
...
...
proceed? Adam?

I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click

through

...
...
...
rates for the two approaches.

...
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:

...
Hi,

Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined

in

...
...
...
...
...
https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates

.

...
...
...
...
...
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better,

sometimes

...
...
...
...
...
worse) :

$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",

$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",

$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",

Le 20/01/2016 19:34, Jon Robson a écrit : > > I'm actually interested to see whether this yields better results

in

...
...
...
...
...
> certain examples where the algorithm is lacking [1]. If it's done as > an A/B test we could even measure things such as click throughs in

the

...
...
...
...
...
> related article feature (whether they go up or not) > > Out of interest is it also possible to take article size and type

into

...
...
...
...
...
> account and not returning any morelike results for things like > disambiguation pages and stubs? > > [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya > > > On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org > wrote: >> >> One thing we could do regarding the quality of the output is check >> results >> against a random sample of popular articles (example approach to

find

...
...
...
...
...
>> some >> articles) on mdot Wikipedia. Presuming that improves the quality of >> the >> recommendations or at least does not degrade them, we should

consider

...
...
...
...
...
>> adding >> the enhancement task to a future sprint, with further

instrumentation

...
...
...
...
...
>> and >> A/B testing / timeboxed beta test, etc. >> >> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good >> fix for >> now for further reduction of client perceived wait, at least for >> non-cold >> cache requests, even if we stop beating up the backend. Does anyone >> know of >> a compelling reason to not do that for the time being? The main

thing

...
...
...
...
...
>> that >> comes to mind as always is growing the Varnish cache object pool - >> probably >> not a huge deal while the thing is only in beta, but on the stable >> channel >> maybe noteworthy because it would run on probably most pages (but >> that's >> what edge caches are for, after all). >> >> Erik, from your perspective does use of smaxage relieve the backend >> sufficiently? >> >> If we do smaxage, then Web, Android, iOS should standardize their >> URLs so we >> get more cache hits at the edge across all clients. Here's the URL

I

...
...
...
...
...
>> see >> being used on the web today from mobile web beta: >> >> >>

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

...
...
...
...
...
>> >> >> -Adam >> >> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >> jhernandez@wikimedia.org wrote: >>> >>> I'd be up to it if we manage to cram it up in a following sprint

and

...
...
...
...
...
>>> it is >>> worth it. >>> >>> We could run a controlled test against production with a long

batch

...
...
...
...
...
>>> of >>> articles and check median/percentiles response time with repeated >>> runs and >>> highlight the different results for human inspection regarding >>> quality. >>> >>> It's been noted previously that the results are far from ideal >>> (which they >>> are because it is just morelike), and I think it would be a great >>> idea to >>> change the endpoint to a specific one that is smarter and has some >>> cache (we >>> could do much more to get relevant results besides text

similarity,

...
...
...
...
...
>>> take >>> into account links, or see also links if there are, etc...). >>> >>> As a note, in mobile web the related articles extension allows >>> editors to >>> specify articles to show in the section, which would avoid queries >>> to >>> cirrussearch if it was more used (once rolled into stable I

guess).

...
...
...
...
...
>>> >>> I remember that the performance related task was closed as

resolved

...
...
...
...
...
>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>> reopen it or >>> create a new one? >>> >>> I'm not sure if we ended up adding the smaxage parameter (I think

we

...
...
...
...
...
>>> didn't), should we? To me it seems a no-brainer that we should be >>> caching >>> this results in varnish since they don't need to be completely up

to

...
...
...
...
...
>>> date >>> for this use case. >>> >>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>> ebernhardson@wikimedia.org wrote: >>>> >>>> Both mobile apps and web are using CirrusSearch's morelike:

feature

...
...
...
...
...
>>>> which >>>> is showing some performance issues on our end. We would like to >>>> make a >>>> performance optimization to it, but before we would prefer to run >>>> an A/B >>>> test to see if the results are still "about as good" as they are >>>> currently. >>>> >>>> The optimization is basically: Currently more like this takes the >>>> entire >>>> article into account, we would like to change this to take only

the

...
...
...
...
...
>>>> opening >>>> text of an article into account. This should reduce the amount of >>>> work we >>>> have to do on the backend saving both server load and latency the >>>> user sees >>>> running the query. >>>> >>>> This can be triggered by adding these two query parameters to the >>>> search >>>> api request that is being performed: >>>> >>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>> >>>> >>>> The API will give a warning that these parameters do not exist,

but

...
...
...
...
...
>>>> they >>>> are safe to ignore. Would any of you be willing to run this test? >>>> We would >>>> basically want to look at user perceived latency along with click >>>> through >>>> rates for the current default setup along with the restricted

setup

...
...
...
...
...
>>>> using >>>> only opening_text. >>>> >>>> Erik B. >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Gabriel Wicke Principal Engineer, Wikimedia Foundation

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

Dmitry Brant

15 Feb 15 Feb

9:26 p.m.

Just a quick note that our latest production release (just published) contains this A/B test, in addition to the other updates. Looking forward to seeing the numbers from this!

-Dmitry

On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant dbrant@wikimedia.org wrote:

...

Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.

On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:

...
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.

On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:

...
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think

if

...
we're doing near term experimentation with a controlled A/B test the

Android

...
app is the only logical place to start. Dmitry, can that work for you?

It's

...
not required, but I think it would be neat to see if we can move the

needle

...
even more. Of course your quarterly goals take top priority...but what

do

...
you think?

On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:

...
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your

replies!

...
...
On Friday, January 22, 2016, Erik Bernhardson <

ebernhardson@wikimedia.org>

...
...
wrote:

...
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:

...
Regarding the caching, we would need to agree between apps and web

about

...
...
...
...
the url and smaxage parameter as Adam noted so that the urls are

exactly the

...
...
...
...
same to not bloat varnish and reuse the same cached objects across platforms.

It is an extremely adhoc and brittle solution but seems like it

would be

...
...
...
...
the greatest win.

20% of the traffic from searches by being only in android and web

beta

...
...
...
...
seems a lot to me, and we should work on reducing it, otherwise when

it hits

...
...
...
...
web stable we're going to crush the servers, so caching seems the

highest

...
...
...
...
priority.

To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)

...
Let's chime in https://phabricator.wikimedia.org/T124216 and

continue

...
...
...
...
the cache discussion there.

Regarding the validity of results with opening text only, how should

we

...
...
...
...
proceed? Adam?

I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click

through

...
...
...
rates for the two approaches.

...
On Wed, Jan 20, 2016 at 9:34 PM, David Causse <dcausse@wikimedia.org

...
...
...
wrote: > > Hi, > > Yes we can combine many factors, from templates (quality but also > disambiguation/stubs), size and others. > Today cirrus uses mostly the number of incoming links which (imho)

is

...
...
...
...
> not very good for morelike. > On enwiki results will also be scored according the weights defined

in

...
...
...
...
>

https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

...
...
...
...
> > I wrote a small bash to compare results : > https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad > Here is some random results from the list (Semetimes better,

sometimes

...
...
...
...
> worse) : > > $ sh morelike.sh Revolution_Muslim > Defaults > "title": "Chess", > "title": "Suicide attack", > "title": "Zachary Adam Chesser", > ======= > Opening text no boost links > "title": "Hungarian Revolution of 1956", > "title": "Muslims for America", > "title": "Salafist Front", > > $ sh morelike.sh Chesser > Defaults > "title": "Chess", > "title": "Edinburgh", > "title": "Edinburgh Corn Exchange", > ======= > Opening text no boost links > "title": "Dreghorn Barracks", > "title": "Edinburgh Chess Club", > "title": "Threipmuir Reservoir", > > $ sh morelike.sh Time_%28disambiguation%29 > Defaults > "title": "Atlantis: The Lost Empire", > "title": "Stargate", > "title": "Stargate SG-1", > ======= > Opening text no boost links > "title": "Father Time (disambiguation)", > "title": "The Last Time", > "title": "Time After Time", > > > > > > Le 20/01/2016 19:34, Jon Robson a écrit : >> >> I'm actually interested to see whether this yields better results

in

...
...
...
...
>> certain examples where the algorithm is lacking [1]. If it's done

as

...
...
...
...
>> an A/B test we could even measure things such as click throughs in

the

...
...
...
...
>> related article feature (whether they go up or not) >> >> Out of interest is it also possible to take article size and type

into

...
...
...
...
>> account and not returning any morelike results for things like >> disambiguation pages and stubs? >> >> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >> >> >> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org >> wrote: >>> >>> One thing we could do regarding the quality of the output is check >>> results >>> against a random sample of popular articles (example approach to

find

...
...
...
...
>>> some >>> articles) on mdot Wikipedia. Presuming that improves the quality

of

...
...
...
...
>>> the >>> recommendations or at least does not degrade them, we should

consider

...
...
...
...
>>> adding >>> the enhancement task to a future sprint, with further

instrumentation

...
...
...
...
>>> and >>> A/B testing / timeboxed beta test, etc. >>> >>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good >>> fix for >>> now for further reduction of client perceived wait, at least for >>> non-cold >>> cache requests, even if we stop beating up the backend. Does

anyone

...
...
...
...
>>> know of >>> a compelling reason to not do that for the time being? The main

thing

...
...
...
...
>>> that >>> comes to mind as always is growing the Varnish cache object pool - >>> probably >>> not a huge deal while the thing is only in beta, but on the stable >>> channel >>> maybe noteworthy because it would run on probably most pages (but >>> that's >>> what edge caches are for, after all). >>> >>> Erik, from your perspective does use of smaxage relieve the

backend

...
...
...
...
>>> sufficiently? >>> >>> If we do smaxage, then Web, Android, iOS should standardize their >>> URLs so we >>> get more cache hits at the edge across all clients. Here's the

URL I

...
...
...
...
>>> see >>> being used on the web today from mobile web beta: >>> >>> >>>

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

...
...
...
...
>>> >>> >>> -Adam >>> >>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>> jhernandez@wikimedia.org wrote: >>>> >>>> I'd be up to it if we manage to cram it up in a following sprint

and

...
...
...
...
>>>> it is >>>> worth it. >>>> >>>> We could run a controlled test against production with a long

batch

...
...
...
...
>>>> of >>>> articles and check median/percentiles response time with repeated >>>> runs and >>>> highlight the different results for human inspection regarding >>>> quality. >>>> >>>> It's been noted previously that the results are far from ideal >>>> (which they >>>> are because it is just morelike), and I think it would be a great >>>> idea to >>>> change the endpoint to a specific one that is smarter and has

some

...
...
...
...
>>>> cache (we >>>> could do much more to get relevant results besides text

similarity,

...
...
...
...
>>>> take >>>> into account links, or see also links if there are, etc...). >>>> >>>> As a note, in mobile web the related articles extension allows >>>> editors to >>>> specify articles to show in the section, which would avoid

queries

...
...
...
...
>>>> to >>>> cirrussearch if it was more used (once rolled into stable I

guess).

...
...
...
...
>>>> >>>> I remember that the performance related task was closed as

resolved

...
...
...
...
>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>>> reopen it or >>>> create a new one? >>>> >>>> I'm not sure if we ended up adding the smaxage parameter (I

think we

...
...
...
...
>>>> didn't), should we? To me it seems a no-brainer that we should be >>>> caching >>>> this results in varnish since they don't need to be completely

up to

...
...
...
...
>>>> date >>>> for this use case. >>>> >>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>> ebernhardson@wikimedia.org wrote: >>>>> >>>>> Both mobile apps and web are using CirrusSearch's morelike:

feature

...
...
...
...
>>>>> which >>>>> is showing some performance issues on our end. We would like to >>>>> make a >>>>> performance optimization to it, but before we would prefer to

run

...
...
...
...
>>>>> an A/B >>>>> test to see if the results are still "about as good" as they are >>>>> currently. >>>>> >>>>> The optimization is basically: Currently more like this takes

the

...
...
...
...
>>>>> entire >>>>> article into account, we would like to change this to take only

the

...
...
...
...
>>>>> opening >>>>> text of an article into account. This should reduce the amount

of

...
...
...
...
>>>>> work we >>>>> have to do on the backend saving both server load and latency

the

...
...
...
...
>>>>> user sees >>>>> running the query. >>>>> >>>>> This can be triggered by adding these two query parameters to

the

...
...
...
...
>>>>> search >>>>> api request that is being performed: >>>>> >>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>> >>>>> >>>>> The API will give a warning that these parameters do not exist,

but

...
...
...
...
>>>>> they >>>>> are safe to ignore. Would any of you be willing to run this

test?

...
...
...
...
>>>>> We would >>>>> basically want to look at user perceived latency along with

click

...
...
...
...
>>>>> through >>>>> rates for the current default setup along with the restricted

setup

...
...
...
...
>>>>> using >>>>> only opening_text. >>>>> >>>>> Erik B. >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l > > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Gabriel Wicke Principal Engineer, Wikimedia Foundation

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

Jon Katz

19 Feb 19 Feb

2 a.m.

Hi, Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.

I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles. -J

On Mon, Feb 15, 2016 at 11:26 AM, Dmitry Brant dbrant@wikimedia.org wrote:

...

Just a quick note that our latest production release (just published) contains this A/B test, in addition to the other updates. Looking forward to seeing the numbers from this!

-Dmitry

On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant dbrant@wikimedia.org wrote:

...
Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.

On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:

...
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.

On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:

...
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I

think if

...
we're doing near term experimentation with a controlled A/B test the

Android

...
app is the only logical place to start. Dmitry, can that work for you?

It's

...
not required, but I think it would be neat to see if we can move the

needle

...
even more. Of course your quarterly goals take top priority...but what

do

...
you think?

On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org

wrote:

...
...
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your

replies!

...
...
On Friday, January 22, 2016, Erik Bernhardson <

ebernhardson@wikimedia.org>

...
...
wrote:

...
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote: > > Regarding the caching, we would need to agree between apps and web

about

...
...
...
> the url and smaxage parameter as Adam noted so that the urls are

exactly the

...
...
...
> same to not bloat varnish and reuse the same cached objects across > platforms. > > It is an extremely adhoc and brittle solution but seems like it

would be

...
...
...
> the greatest win. > > 20% of the traffic from searches by being only in android and web

beta

...
...
...
> seems a lot to me, and we should work on reducing it, otherwise

when it hits

...
...
...
> web stable we're going to crush the servers, so caching seems the

highest

...
...
...
> priority. > To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)

> > Let's chime in https://phabricator.wikimedia.org/T124216 and

continue

...
...
...
> the cache discussion there. > > Regarding the validity of results with opening text only, how

should we

...
...
...
> proceed? Adam? > I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click

through

...
...
...
rates for the two approaches.

> > On Wed, Jan 20, 2016 at 9:34 PM, David Causse <

dcausse@wikimedia.org>

...
...
...
> wrote: >> >> Hi, >> >> Yes we can combine many factors, from templates (quality but also >> disambiguation/stubs), size and others. >> Today cirrus uses mostly the number of incoming links which (imho)

is

...
...
...
>> not very good for morelike. >> On enwiki results will also be scored according the weights

defined in

...
...
...
>>

https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

...
...
...
>> >> I wrote a small bash to compare results : >> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad >> Here is some random results from the list (Semetimes better,

sometimes

...
...
...
>> worse) : >> >> $ sh morelike.sh Revolution_Muslim >> Defaults >> "title": "Chess", >> "title": "Suicide attack", >> "title": "Zachary Adam Chesser", >> ======= >> Opening text no boost links >> "title": "Hungarian Revolution of 1956", >> "title": "Muslims for America", >> "title": "Salafist Front", >> >> $ sh morelike.sh Chesser >> Defaults >> "title": "Chess", >> "title": "Edinburgh", >> "title": "Edinburgh Corn Exchange", >> ======= >> Opening text no boost links >> "title": "Dreghorn Barracks", >> "title": "Edinburgh Chess Club", >> "title": "Threipmuir Reservoir", >> >> $ sh morelike.sh Time_%28disambiguation%29 >> Defaults >> "title": "Atlantis: The Lost Empire", >> "title": "Stargate", >> "title": "Stargate SG-1", >> ======= >> Opening text no boost links >> "title": "Father Time (disambiguation)", >> "title": "The Last Time", >> "title": "Time After Time", >> >> >> >> >> >> Le 20/01/2016 19:34, Jon Robson a écrit : >>> >>> I'm actually interested to see whether this yields better

results in

...
...
...
>>> certain examples where the algorithm is lacking [1]. If it's done

as

...
...
...
>>> an A/B test we could even measure things such as click throughs

in the

...
...
...
>>> related article feature (whether they go up or not) >>> >>> Out of interest is it also possible to take article size and type

into

...
...
...
>>> account and not returning any morelike results for things like >>> disambiguation pages and stubs? >>> >>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >>> >>> >>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org >>> wrote: >>>> >>>> One thing we could do regarding the quality of the output is

check

...
...
...
>>>> results >>>> against a random sample of popular articles (example approach to

find

...
...
...
>>>> some >>>> articles) on mdot Wikipedia. Presuming that improves the quality

of

...
...
...
>>>> the >>>> recommendations or at least does not degrade them, we should

consider

...
...
...
>>>> adding >>>> the enhancement task to a future sprint, with further

instrumentation

...
...
...
>>>> and >>>> A/B testing / timeboxed beta test, etc. >>>> >>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a

good

...
...
...
>>>> fix for >>>> now for further reduction of client perceived wait, at least for >>>> non-cold >>>> cache requests, even if we stop beating up the backend. Does

anyone

...
...
...
>>>> know of >>>> a compelling reason to not do that for the time being? The main

thing

...
...
...
>>>> that >>>> comes to mind as always is growing the Varnish cache object pool

...
...
...
>>>> probably >>>> not a huge deal while the thing is only in beta, but on the

stable

...
...
...
>>>> channel >>>> maybe noteworthy because it would run on probably most pages (but >>>> that's >>>> what edge caches are for, after all). >>>> >>>> Erik, from your perspective does use of smaxage relieve the

backend

...
...
...
>>>> sufficiently? >>>> >>>> If we do smaxage, then Web, Android, iOS should standardize their >>>> URLs so we >>>> get more cache hits at the edge across all clients. Here's the

URL I

...
...
...
>>>> see >>>> being used on the web today from mobile web beta: >>>> >>>> >>>>

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

...
...
...
>>>> >>>> >>>> -Adam >>>> >>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>>> jhernandez@wikimedia.org wrote: >>>>> >>>>> I'd be up to it if we manage to cram it up in a following

sprint and

...
...
...
>>>>> it is >>>>> worth it. >>>>> >>>>> We could run a controlled test against production with a long

batch

...
...
...
>>>>> of >>>>> articles and check median/percentiles response time with

repeated

...
...
...
>>>>> runs and >>>>> highlight the different results for human inspection regarding >>>>> quality. >>>>> >>>>> It's been noted previously that the results are far from ideal >>>>> (which they >>>>> are because it is just morelike), and I think it would be a

great

...
...
...
>>>>> idea to >>>>> change the endpoint to a specific one that is smarter and has

some

...
...
...
>>>>> cache (we >>>>> could do much more to get relevant results besides text

similarity,

...
...
...
>>>>> take >>>>> into account links, or see also links if there are, etc...). >>>>> >>>>> As a note, in mobile web the related articles extension allows >>>>> editors to >>>>> specify articles to show in the section, which would avoid

queries

...
...
...
>>>>> to >>>>> cirrussearch if it was more used (once rolled into stable I

guess).

...
...
...
>>>>> >>>>> I remember that the performance related task was closed as

resolved

...
...
...
>>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>>>> reopen it or >>>>> create a new one? >>>>> >>>>> I'm not sure if we ended up adding the smaxage parameter (I

think we

...
...
...
>>>>> didn't), should we? To me it seems a no-brainer that we should

be

...
...
...
>>>>> caching >>>>> this results in varnish since they don't need to be completely

up to

...
...
...
>>>>> date >>>>> for this use case. >>>>> >>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>>> ebernhardson@wikimedia.org wrote: >>>>>> >>>>>> Both mobile apps and web are using CirrusSearch's morelike:

feature

...
...
...
>>>>>> which >>>>>> is showing some performance issues on our end. We would like to >>>>>> make a >>>>>> performance optimization to it, but before we would prefer to

run

...
...
...
>>>>>> an A/B >>>>>> test to see if the results are still "about as good" as they

are

...
...
...
>>>>>> currently. >>>>>> >>>>>> The optimization is basically: Currently more like this takes

the

...
...
...
>>>>>> entire >>>>>> article into account, we would like to change this to take

only the

...
...
...
>>>>>> opening >>>>>> text of an article into account. This should reduce the amount

of

...
...
...
>>>>>> work we >>>>>> have to do on the backend saving both server load and latency

the

...
...
...
>>>>>> user sees >>>>>> running the query. >>>>>> >>>>>> This can be triggered by adding these two query parameters to

the

...
...
...
>>>>>> search >>>>>> api request that is being performed: >>>>>> >>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>>> >>>>>> >>>>>> The API will give a warning that these parameters do not

exist, but

...
...
...
>>>>>> they >>>>>> are safe to ignore. Would any of you be willing to run this

test?

...
...
...
>>>>>> We would >>>>>> basically want to look at user perceived latency along with

click

...
...
...
>>>>>> through >>>>>> rates for the current default setup along with the restricted

setup

...
...
...
>>>>>> using >>>>>> only opening_text. >>>>>> >>>>>> Erik B. >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l > > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l >

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Gabriel Wicke Principal Engineer, Wikimedia Foundation

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Erik Bernhardson

2:14 a.m.

The more like code lives in elasticsearch, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ml... gives a decent rundown of the various parameters available. The defaults we currently use are at https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/C...

These can be overridden with a custom page on the wiki at MediaWiki:cirrussearch-morelikethis-settings. I can't suggest editors should tune this on their own though, it requires careful testing to see what changes do. The same options can also be overridden at query time via a series of internal test-only paremeters implemented at https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/i...

On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:

...

Hi, Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.

I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles. -J

On Mon, Feb 15, 2016 at 11:26 AM, Dmitry Brant dbrant@wikimedia.org wrote:

...
Just a quick note that our latest production release (just published) contains this A/B test, in addition to the other updates. Looking forward to seeing the numbers from this!

-Dmitry

On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant dbrant@wikimedia.org wrote:

...
Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.

On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:

...
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.

On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:

...
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I

think if

...
we're doing near term experimentation with a controlled A/B test the

Android

...
app is the only logical place to start. Dmitry, can that work for

you? It's

...
not required, but I think it would be neat to see if we can move the

needle

...
even more. Of course your quarterly goals take top priority...but

what do

...
you think?

On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org

wrote:

...
...
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your

replies!

...
...
On Friday, January 22, 2016, Erik Bernhardson <

ebernhardson@wikimedia.org>

...
...
wrote: > > On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez > jhernandez@wikimedia.org wrote: >> >> Regarding the caching, we would need to agree between apps and web

about

...
...
>> the url and smaxage parameter as Adam noted so that the urls are

exactly the

...
...
>> same to not bloat varnish and reuse the same cached objects across >> platforms. >> >> It is an extremely adhoc and brittle solution but seems like it

would be

...
...
>> the greatest win. >> >> 20% of the traffic from searches by being only in android and web

beta

...
...
>> seems a lot to me, and we should work on reducing it, otherwise

when it hits

...
...
>> web stable we're going to crush the servers, so caching seems the

highest

...
...
>> priority. >> > To clarify its 20% of the load, as opposed to 20% of the traffic.

But

...
...
> same difference :) > >> >> Let's chime in https://phabricator.wikimedia.org/T124216 and

continue

...
...
>> the cache discussion there. >> >> Regarding the validity of results with opening text only, how

should we

...
...
>> proceed? Adam? >> > I've put together https://phabricator.wikimedia.org/T124258 to

track

...
...
> putting together an AB test that measures the difference in click

through

...
...
> rates for the two approaches. > > >> >> On Wed, Jan 20, 2016 at 9:34 PM, David Causse <

dcausse@wikimedia.org>

...
...
>> wrote: >>> >>> Hi, >>> >>> Yes we can combine many factors, from templates (quality but also >>> disambiguation/stubs), size and others. >>> Today cirrus uses mostly the number of incoming links which

(imho) is

...
...
>>> not very good for morelike. >>> On enwiki results will also be scored according the weights

defined in

...
...
>>>

https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.

...
...
>>> >>> I wrote a small bash to compare results : >>> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad >>> Here is some random results from the list (Semetimes better,

sometimes

...
...
>>> worse) : >>> >>> $ sh morelike.sh Revolution_Muslim >>> Defaults >>> "title": "Chess", >>> "title": "Suicide attack", >>> "title": "Zachary Adam Chesser", >>> ======= >>> Opening text no boost links >>> "title": "Hungarian Revolution of 1956", >>> "title": "Muslims for America", >>> "title": "Salafist Front", >>> >>> $ sh morelike.sh Chesser >>> Defaults >>> "title": "Chess", >>> "title": "Edinburgh", >>> "title": "Edinburgh Corn Exchange", >>> ======= >>> Opening text no boost links >>> "title": "Dreghorn Barracks", >>> "title": "Edinburgh Chess Club", >>> "title": "Threipmuir Reservoir", >>> >>> $ sh morelike.sh Time_%28disambiguation%29 >>> Defaults >>> "title": "Atlantis: The Lost Empire", >>> "title": "Stargate", >>> "title": "Stargate SG-1", >>> ======= >>> Opening text no boost links >>> "title": "Father Time (disambiguation)", >>> "title": "The Last Time", >>> "title": "Time After Time", >>> >>> >>> >>> >>> >>> Le 20/01/2016 19:34, Jon Robson a écrit : >>>> >>>> I'm actually interested to see whether this yields better

results in

...
...
>>>> certain examples where the algorithm is lacking [1]. If it's

done as

...
...
>>>> an A/B test we could even measure things such as click throughs

in the

...
...
>>>> related article feature (whether they go up or not) >>>> >>>> Out of interest is it also possible to take article size and

type into

...
...
>>>> account and not returning any morelike results for things like >>>> disambiguation pages and stubs? >>>> >>>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >>>> >>>> >>>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org >>>> wrote: >>>>> >>>>> One thing we could do regarding the quality of the output is

check

...
...
>>>>> results >>>>> against a random sample of popular articles (example approach

to find

...
...
>>>>> some >>>>> articles) on mdot Wikipedia. Presuming that improves the

quality of

...
...
>>>>> the >>>>> recommendations or at least does not degrade them, we should

consider

...
...
>>>>> adding >>>>> the enhancement task to a future sprint, with further

instrumentation

...
...
>>>>> and >>>>> A/B testing / timeboxed beta test, etc. >>>>> >>>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a

good

...
...
>>>>> fix for >>>>> now for further reduction of client perceived wait, at least for >>>>> non-cold >>>>> cache requests, even if we stop beating up the backend. Does

anyone

...
...
>>>>> know of >>>>> a compelling reason to not do that for the time being? The main

thing

...
...
>>>>> that >>>>> comes to mind as always is growing the Varnish cache object

pool -

...
...
>>>>> probably >>>>> not a huge deal while the thing is only in beta, but on the

stable

...
...
>>>>> channel >>>>> maybe noteworthy because it would run on probably most pages

(but

...
...
>>>>> that's >>>>> what edge caches are for, after all). >>>>> >>>>> Erik, from your perspective does use of smaxage relieve the

backend

...
...
>>>>> sufficiently? >>>>> >>>>> If we do smaxage, then Web, Android, iOS should standardize

their

...
...
>>>>> URLs so we >>>>> get more cache hits at the edge across all clients. Here's the

URL I

...
...
>>>>> see >>>>> being used on the web today from mobile web beta: >>>>> >>>>> >>>>>

https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...

...
...
>>>>> >>>>> >>>>> -Adam >>>>> >>>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>>>> jhernandez@wikimedia.org wrote: >>>>>> >>>>>> I'd be up to it if we manage to cram it up in a following

sprint and

...
...
>>>>>> it is >>>>>> worth it. >>>>>> >>>>>> We could run a controlled test against production with a long

batch

...
...
>>>>>> of >>>>>> articles and check median/percentiles response time with

repeated

...
...
>>>>>> runs and >>>>>> highlight the different results for human inspection regarding >>>>>> quality. >>>>>> >>>>>> It's been noted previously that the results are far from ideal >>>>>> (which they >>>>>> are because it is just morelike), and I think it would be a

great

...
...
>>>>>> idea to >>>>>> change the endpoint to a specific one that is smarter and has

some

...
...
>>>>>> cache (we >>>>>> could do much more to get relevant results besides text

similarity,

...
...
>>>>>> take >>>>>> into account links, or see also links if there are, etc...). >>>>>> >>>>>> As a note, in mobile web the related articles extension allows >>>>>> editors to >>>>>> specify articles to show in the section, which would avoid

queries

...
...
>>>>>> to >>>>>> cirrussearch if it was more used (once rolled into stable I

guess).

...
...
>>>>>> >>>>>> I remember that the performance related task was closed as

resolved

...
...
>>>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>>>>> reopen it or >>>>>> create a new one? >>>>>> >>>>>> I'm not sure if we ended up adding the smaxage parameter (I

think we

...
...
>>>>>> didn't), should we? To me it seems a no-brainer that we should

be

...
...
>>>>>> caching >>>>>> this results in varnish since they don't need to be completely

up to

...
...
>>>>>> date >>>>>> for this use case. >>>>>> >>>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>>>> ebernhardson@wikimedia.org wrote: >>>>>>> >>>>>>> Both mobile apps and web are using CirrusSearch's morelike:

feature

...
...
>>>>>>> which >>>>>>> is showing some performance issues on our end. We would like

to

...
...
>>>>>>> make a >>>>>>> performance optimization to it, but before we would prefer to

run

...
...
>>>>>>> an A/B >>>>>>> test to see if the results are still "about as good" as they

are

...
...
>>>>>>> currently. >>>>>>> >>>>>>> The optimization is basically: Currently more like this takes

the

...
...
>>>>>>> entire >>>>>>> article into account, we would like to change this to take

only the

...
...
>>>>>>> opening >>>>>>> text of an article into account. This should reduce the

amount of

...
...
>>>>>>> work we >>>>>>> have to do on the backend saving both server load and latency

the

...
...
>>>>>>> user sees >>>>>>> running the query. >>>>>>> >>>>>>> This can be triggered by adding these two query parameters to

the

...
...
>>>>>>> search >>>>>>> api request that is being performed: >>>>>>> >>>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>>>> >>>>>>> >>>>>>> The API will give a warning that these parameters do not

exist, but

...
...
>>>>>>> they >>>>>>> are safe to ignore. Would any of you be willing to run this

test?

...
...
>>>>>>> We would >>>>>>> basically want to look at user perceived latency along with

click

...
...
>>>>>>> through >>>>>>> rates for the current default setup along with the restricted

setup

...
...
>>>>>>> using >>>>>>> only opening_text. >>>>>>> >>>>>>> Erik B. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mobile-l mailing list >>>>>>> Mobile-l@lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Gabriel Wicke Principal Engineer, Wikimedia Foundation

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Gergo Tisza

2:15 a.m.

On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:

...

Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.

I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles.

"More like" is an Elasticsearch https://en.wikipedia.org/wiki/Elasticsearch feature; the documentation is here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html. I'd imagine the source code is way too complicated to give much insight to the casual reader (as Elasticsearch is a large and complex piece of software) but I never looked into the ES codebase so that's just a guess. The configuration we use for morelike queries is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450. The wrapper code that fires the ES query is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800 (but at a glance it doesn't do anything interesting).

Jon Katz

2:29 a.m.

Thanks both! This clarifies a lot. I think the primary issue that editors had raised and I had hoped to explore was popularity/importance v. obscurity.

Specifically, there have been concerns that the results tilt towards more popular articles (here https://www.mediawiki.org/wiki/Topic:Swjyfj59pkjfol7m and here https://www.mediawiki.org/wiki/Topic:Sxy84nxinxqqld2i), but it seems that page traffic is not a variable. Instead, what seems to be happening is that the raw # of similar terms is being used, rather than the % of similar terms. This means that longer articles are favored. Is that a fair assessment?

-J

On Thu, Feb 18, 2016 at 4:15 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...

On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:

...
Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.

I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles.

"More like" is an Elasticsearch https://en.wikipedia.org/wiki/Elasticsearch feature; the documentation is here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html. I'd imagine the source code is way too complicated to give much insight to the casual reader (as Elasticsearch is a large and complex piece of software) but I never looked into the ES codebase so that's just a guess. The configuration we use for morelike queries is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450. The wrapper code that fires the ES query is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800 (but at a glance it doesn't do anything interesting).

Erik Bernhardson

3:01 a.m.

There is a popularity factor at work, All CirrusSearch queries take into account the number of incoming links as part of a rescore on a few thousand of the top results.

There are a few ways we can tweak this. All of the examples below use internal testing query parameters, i can't suggest using these as part of normal production usage outside of A/B testing, but they work well for exploring variations

query patterns used: 'opening text no boost links': '?search=morelike:%s&cirrusBoostLinks=no&cirrusMltUseFields=yes&cirrusMltFields=opening_text', 'opening text': '?search=morelike:%s&cirrusMltUseFields=yes&cirrusMltFields=opening_text', 'no boost links': '?search=morelike:%s&cirrusBoostLinks=no', 'basic': '?search=morelike:%s',

Test output: A_Summer_Bird-Cage: basic I Know Why the Caged Bird Sings Princess Louise, Duchess of Argyll J. K. Rowling

opening text I Know Why the Caged Bird Sings Themes in Maya Angelou's autobiographies Abnormal behaviour of birds in captivity

opening text no boost links Themes in Maya Angelou's autobiographies Get Sexy I Know Why the Caged Bird Sings

no boost links I Know Why the Caged Bird Sings Jerusalem the Golden Princess Louise, Duchess of Argyll

Isabel_Fonseca: basic Emma Goldman Martin Amis J. K. Rowling

opening text I Know Why the Caged Bird Sings Kate Millett Hillary Clinton

opening text no boost links I Know Why the Caged Bird Sings Mary Beth Keane Elizabeth F. Ellet

no boost links Martin Amis Margaret Fuller Emma Goldman

Andrew_Michael_Hurley: basic J. K. Rowling Enid Blyton Ernest Shackleton

opening text List of James Bond novels and short stories Harry Potter James Bond

opening text no boost links List of James Bond novels and short stories Childhood's End Deborah Swift

no boost links Pure (Miller novel) The Other Hand Stella Gibbons

The_Queen_of_the_Tearling: basic Emma Watson J. K. Rowling Emma Goldman

opening text The Sun Also Rises The Twilight Saga The Historian

opening text no boost links List of Buffyverse novels Witz (novel)

It's very hard to pick and choose a few small samples of queries and say "this is now better". I highly suggest, at a minimum, A/B testing variations and basing results on user click through and bounce rates. Back testing thousands of user queries and comparing them to user click through or satisfaction (clickthrough + dwell) might be much more useful.

On Thu, Feb 18, 2016 at 4:29 PM, Jon Katz jkatz@wikimedia.org wrote:

...

Thanks both! This clarifies a lot. I think the primary issue that editors had raised and I had hoped to explore was popularity/importance v. obscurity.

Specifically, there have been concerns that the results tilt towards more popular articles (here https://www.mediawiki.org/wiki/Topic:Swjyfj59pkjfol7m and here https://www.mediawiki.org/wiki/Topic:Sxy84nxinxqqld2i), but it seems that page traffic is not a variable. Instead, what seems to be happening is that the raw # of similar terms is being used, rather than the % of similar terms. This means that longer articles are favored. Is that a fair assessment?

-J

On Thu, Feb 18, 2016 at 4:15 PM, Gergo Tisza gtisza@wikimedia.org wrote:

...
On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:

...
Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.

I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles.

"More like" is an Elasticsearch https://en.wikipedia.org/wiki/Elasticsearch feature; the documentation is here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html. I'd imagine the source code is way too complicated to give much insight to the casual reader (as Elasticsearch is a large and complex piece of software) but I never looked into the ES codebase so that's just a guess. The configuration we use for morelike queries is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450. The wrapper code that fires the ES query is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800 (but at a glance it doesn't do anything interesting).

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Jon Katz

20 Feb 20 Feb

3:13 a.m.

On Thu, Feb 18, 2016 at 5:01 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:

...

Back testing thousands of user queries and comparing them to user click through or satisfaction (clickthrough + dwell)

Thanks, Erik! This is very helpful. What do you mean by 'back testing'?

Also, even without boost links, there seems to be a bias towards popular (long pages). it seems that a focus on # of words in common rather than % is one of the things leading to long articles seeing so much more traction - would this be an easy thing to test as well? -J

Dan Garry

3:19 a.m.

On 19 February 2016 at 17:13, Jon Katz jkatz@wikimedia.org wrote:

...

Thanks, Erik! This is very helpful. What do you mean by 'back testing'?

For search, there's a few different approaches for quantitative testing that are less difficult than A/B testing in terms of development overhead, data analysis and coordination. One of those is to replay real user queries against the index, but run the query with slightly different parameters from original. This is super cheap compared to an A/B test, but the downside is that it can only answer really deterministic (for lack of a better word) things, like how the parameters affect the zero-results rate or result ordering; since there's no user interaction with the replayed queries, you don't know what the clickthrough would've been, so it's hard to measure how satisfied the user would've been.

Hopefully that helps explain it.

Thanks, Dan

-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

Jon Katz

7:23 a.m.

Great explanation. Thanks, Dan!

On Fri, Feb 19, 2016 at 5:19 PM, Dan Garry dgarry@wikimedia.org wrote:

...

On 19 February 2016 at 17:13, Jon Katz jkatz@wikimedia.org wrote:

...
Thanks, Erik! This is very helpful. What do you mean by 'back testing'?

For search, there's a few different approaches for quantitative testing that are less difficult than A/B testing in terms of development overhead, data analysis and coordination. One of those is to replay real user queries against the index, but run the query with slightly different parameters from original. This is super cheap compared to an A/B test, but the downside is that it can only answer really deterministic (for lack of a better word) things, like how the parameters affect the zero-results rate or result ordering; since there's no user interaction with the replayed queries, you don't know what the clickthrough would've been, so it's hard to measure how satisfied the user would've been.

Hopefully that helps explain it.

Thanks, Dan

-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

David Causse

22 Feb 22 Feb

1:10 p.m.

Le 20/02/2016 02:13, Jon Katz a écrit :

...

Also, even without boost links, there seems to be a bias towards popular (long pages). it seems that a focus on # of words in common rather than % is one of the things leading to long articles seeing so much more traction - would this be an easy thing to test as well?

Hi,

you're right but I think it's because of the boost templates feature which is enabled even when boostlinks is not: on enwiki few templates are configured in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates which means that a featured article will be overboosted.

We could fine tune the core more like algorithm with various params but today I think that the rescore features (boostlinks, boost-templates) is what have the most impact.

To sum up, 2 types of score are combined when ranking articles: - A score that computes the similarity between documents, this can be fine-tuned[1] - A score (we call it "rescore") that uses article metadata: boostlinks, templates.

The way these scores are combined can be configured with a rescore profile, but today it's a product of all the scores, e.g.

morelike:A_Summer_Bird-Cage

The score for "I Know Why the Caged Bird Sings" with boost links is: - similarity: 0.3457441 (terms chosen: "from", "cage", "bird") - boostlinks: 2.807535 - boost-templates: 2 - total: 0.3457441 * 2.807535 * 2 => 1.9413773

[1]: https://www.mediawiki.org/wiki/Help:CirrusSearch#morelike:

3230

Age (days ago)

3264

Last active (days ago)

mobile-l@lists.wikimedia.org

20 comments

10 participants

tags (0)

participants (10)

Adam Baso
Dan Garry
David Causse
Dmitry Brant
Erik Bernhardson
Gabriel Wicke
Gergo Tisza
Joaquin Oltra Hernandez
Jon Katz
Jon Robson