Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.
The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.
This can be triggered by adding these two query parameters to the search api request that is being performed:
cirrusMltUseFields=yes&cirrusMltFields=opening_text
The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.
Erik B.
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.
We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.
It's been noted previously that the results are far from ideal (which they are because it is just *morelike*), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or *see also* links if there are, etc...).
As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).
I remember that the performance related task was closed as resolved ( https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?
I'm not sure if we ended up adding the smaxage parameter (I think we didn't https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/search?utf8=%E2%9C%93&q=maxage&type=Code), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.
On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.
The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.
This can be triggered by adding these two query parameters to the search api request that is being performed:
cirrusMltUseFields=yes&cirrusMltFields=opening_text
The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.
Erik B.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach https://phabricator.wikimedia.org/T120504#1900287 to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.
Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).
Erik, from your perspective does use of smaxage relieve the backend sufficiently?
If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
-Adam
On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.
We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.
It's been noted previously that the results are far from ideal (which they are because it is just *morelike*), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or *see also* links if there are, etc...).
As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).
I remember that the performance related task was closed as resolved ( https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?
I'm not sure if we ended up adding the smaxage parameter (I think we didn't https://github.com/wikimedia/mediawiki-extensions-RelatedArticles/search?utf8=%E2%9C%93&q=maxage&type=Code), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.
On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.
The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.
This can be triggered by adding these two query parameters to the search api request that is being performed:
cirrusMltUseFields=yes&cirrusMltFields=opening_text
The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.
Erik B.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)
Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?
[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.
Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).
Erik, from your perspective does use of smaxage relieve the backend sufficiently?
If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
-Adam
On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.
We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.
It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).
As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).
I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?
I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.
On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.
The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.
This can be triggered by adding these two query parameters to the search api request that is being performed:
cirrusMltUseFields=yes&cirrusMltFields=opening_text
The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.
Erik B.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Hi,
Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :
$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",
$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit :
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)
Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?
[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.
Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).
Erik, from your perspective does use of smaxage relieve the backend sufficiently?
If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
-Adam
On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.
We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.
It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).
As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).
I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?
I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.
On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.
The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.
This can be triggered by adding these two query parameters to the search api request that is being performed:
cirrusMltUseFields=yes&cirrusMltFields=opening_text
The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.
Erik B.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.
It is an extremely adhoc and brittle solution but seems like it would be the greatest win.
20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.
Regarding the validity of results with opening text only, how should we proceed? Adam?
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:
Hi,
Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :
$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",
$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit :
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)
Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?
[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.
Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).
Erik, from your perspective does use of smaxage relieve the backend sufficiently?
If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
-Adam
On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.
We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.
It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).
As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).
I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?
I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.
On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.
The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.
This can be triggered by adding these two query parameters to the search api request that is being performed:
cirrusMltUseFields=yes&cirrusMltFields=opening_text
The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.
Erik B.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.
It is an extremely adhoc and brittle solution but seems like it would be the greatest win.
20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.
To clarify its 20% of the load, as opposed to 20% of the traffic. But same
difference :)
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.
Regarding the validity of results with opening text only, how should we proceed? Adam?
I've put together https://phabricator.wikimedia.org/T124258 to track
putting together an AB test that measures the difference in click through rates for the two approaches.
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:
Hi,
Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :
$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",
$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit :
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)
Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?
[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.
Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).
Erik, from your perspective does use of smaxage relieve the backend sufficiently?
If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
-Adam
On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.
We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.
It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).
As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).
I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?
I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.
On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.
The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.
This can be triggered by adding these two query parameters to the search api request that is being performed:
cirrusMltUseFields=yes&cirrusMltFields=opening_text
The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.
Erik B.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your replies!
On Friday, January 22, 2016, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org javascript:_e(%7B%7D,'cvml','jhernandez@wikimedia.org');> wrote:
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.
It is an extremely adhoc and brittle solution but seems like it would be the greatest win.
20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.
To clarify its 20% of the load, as opposed to 20% of the traffic. But
same difference :)
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.
Regarding the validity of results with opening text only, how should we proceed? Adam?
I've put together https://phabricator.wikimedia.org/T124258 to track
putting together an AB test that measures the difference in click through rates for the two approaches.
On Wed, Jan 20, 2016 at 9:34 PM, David Causse <dcausse@wikimedia.org javascript:_e(%7B%7D,'cvml','dcausse@wikimedia.org');> wrote:
Hi,
Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :
$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",
$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit :
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)
Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?
[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso <abaso@wikimedia.org javascript:_e(%7B%7D,'cvml','abaso@wikimedia.org');> wrote:
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.
Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).
Erik, from your perspective does use of smaxage relieve the backend sufficiently?
If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
-Adam
On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez <jhernandez@wikimedia.org javascript:_e(%7B%7D,'cvml','jhernandez@wikimedia.org');> wrote:
I'd be up to it if we manage to cram it up in a following sprint and it is worth it.
We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.
It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).
As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).
I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?
I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.
On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson <ebernhardson@wikimedia.org javascript:_e(%7B%7D,'cvml','ebernhardson@wikimedia.org');> wrote:
> Both mobile apps and web are using CirrusSearch's morelike: feature > which > is showing some performance issues on our end. We would like to make > a > performance optimization to it, but before we would prefer to run an > A/B > test to see if the results are still "about as good" as they are > currently. > > The optimization is basically: Currently more like this takes the > entire > article into account, we would like to change this to take only the > opening > text of an article into account. This should reduce the amount of > work we > have to do on the backend saving both server load and latency the > user sees > running the query. > > This can be triggered by adding these two query parameters to the > search > api request that is being performed: > > cirrusMltUseFields=yes&cirrusMltFields=opening_text > > > The API will give a warning that these parameters do not exist, but > they > are safe to ignore. Would any of you be willing to run this test? We > would > basically want to look at user perceived latency along with click > through > rates for the current default setup along with the restricted setup > using > only opening_text. > > Erik B. > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think if we're doing near term experimentation with a controlled A/B test the Android app is the only logical place to start. Dmitry, can that work for you? It's not required, but I think it would be neat to see if we can move the needle even more. Of course your quarterly goals take top priority...but what do you think?
On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your replies!
On Friday, January 22, 2016, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are *exactly* the same to not bloat varnish and reuse the same cached objects across platforms.
It is an extremely adhoc and brittle solution but seems like it would be the greatest win.
20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.
To clarify its 20% of the load, as opposed to 20% of the traffic. But
same difference :)
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.
Regarding the validity of results with opening text only, how should we proceed? Adam?
I've put together https://phabricator.wikimedia.org/T124258 to track
putting together an AB test that measures the difference in click through rates for the two approaches.
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:
Hi,
Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :
$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",
$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit :
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)
Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?
[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote:
One thing we could do regarding the quality of the output is check results against a random sample of popular articles (example approach to find some articles) on mdot Wikipedia. Presuming that improves the quality of the recommendations or at least does not degrade them, we should consider adding the enhancement task to a future sprint, with further instrumentation and A/B testing / timeboxed beta test, etc.
Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix for now for further reduction of client perceived wait, at least for non-cold cache requests, even if we stop beating up the backend. Does anyone know of a compelling reason to not do that for the time being? The main thing that comes to mind as always is growing the Varnish cache object pool - probably not a huge deal while the thing is only in beta, but on the stable channel maybe noteworthy because it would run on probably most pages (but that's what edge caches are for, after all).
Erik, from your perspective does use of smaxage relieve the backend sufficiently?
If we do smaxage, then Web, Android, iOS should standardize their URLs so we get more cache hits at the edge across all clients. Here's the URL I see being used on the web today from mobile web beta:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
-Adam
On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
> I'd be up to it if we manage to cram it up in a following sprint and > it is > worth it. > > We could run a controlled test against production with a long batch > of > articles and check median/percentiles response time with repeated > runs and > highlight the different results for human inspection regarding > quality. > > It's been noted previously that the results are far from ideal > (which they > are because it is just morelike), and I think it would be a great > idea to > change the endpoint to a specific one that is smarter and has some > cache (we > could do much more to get relevant results besides text similarity, > take > into account links, or see also links if there are, etc...). > > As a note, in mobile web the related articles extension allows > editors to > specify articles to show in the section, which would avoid queries to > cirrussearch if it was more used (once rolled into stable I guess). > > I remember that the performance related task was closed as resolved > (https://phabricator.wikimedia.org/T121254#1907192), should we > reopen it or > create a new one? > > I'm not sure if we ended up adding the smaxage parameter (I think we > didn't), should we? To me it seems a no-brainer that we should be > caching > this results in varnish since they don't need to be completely up to > date > for this use case. > > On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson > ebernhardson@wikimedia.org wrote: > >> Both mobile apps and web are using CirrusSearch's morelike: feature >> which >> is showing some performance issues on our end. We would like to >> make a >> performance optimization to it, but before we would prefer to run >> an A/B >> test to see if the results are still "about as good" as they are >> currently. >> >> The optimization is basically: Currently more like this takes the >> entire >> article into account, we would like to change this to take only the >> opening >> text of an article into account. This should reduce the amount of >> work we >> have to do on the backend saving both server load and latency the >> user sees >> running the query. >> >> This can be triggered by adding these two query parameters to the >> search >> api request that is being performed: >> >> cirrusMltUseFields=yes&cirrusMltFields=opening_text >> >> >> The API will give a warning that these parameters do not exist, but >> they >> are safe to ignore. Would any of you be willing to run this test? >> We would >> basically want to look at user perceived latency along with click >> through >> rates for the current default setup along with the restricted setup >> using >> only opening_text. >> >> Erik B. >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.
On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think if we're doing near term experimentation with a controlled A/B test the Android app is the only logical place to start. Dmitry, can that work for you? It's not required, but I think it would be neat to see if we can move the needle even more. Of course your quarterly goals take top priority...but what do you think?
On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your replies!
On Friday, January 22, 2016, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
Regarding the caching, we would need to agree between apps and web about the url and smaxage parameter as Adam noted so that the urls are exactly the same to not bloat varnish and reuse the same cached objects across platforms.
It is an extremely adhoc and brittle solution but seems like it would be the greatest win.
20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when it hits web stable we're going to crush the servers, so caching seems the highest priority.
To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.
Regarding the validity of results with opening text only, how should we proceed? Adam?
I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click through rates for the two approaches.
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:
Hi,
Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better, sometimes worse) :
$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",
$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit :
I'm actually interested to see whether this yields better results in certain examples where the algorithm is lacking [1]. If it's done as an A/B test we could even measure things such as click throughs in the related article feature (whether they go up or not)
Out of interest is it also possible to take article size and type into account and not returning any morelike results for things like disambiguation pages and stubs?
[1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org wrote: > > One thing we could do regarding the quality of the output is check > results > against a random sample of popular articles (example approach to find > some > articles) on mdot Wikipedia. Presuming that improves the quality of > the > recommendations or at least does not degrade them, we should consider > adding > the enhancement task to a future sprint, with further instrumentation > and > A/B testing / timeboxed beta test, etc. > > Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good > fix for > now for further reduction of client perceived wait, at least for > non-cold > cache requests, even if we stop beating up the backend. Does anyone > know of > a compelling reason to not do that for the time being? The main thing > that > comes to mind as always is growing the Varnish cache object pool - > probably > not a huge deal while the thing is only in beta, but on the stable > channel > maybe noteworthy because it would run on probably most pages (but > that's > what edge caches are for, after all). > > Erik, from your perspective does use of smaxage relieve the backend > sufficiently? > > If we do smaxage, then Web, Android, iOS should standardize their > URLs so we > get more cache hits at the edge across all clients. Here's the URL I > see > being used on the web today from mobile web beta: > > > https://en.m.wikipedia.org/w/api.php?action=query&format=json&format... > > > -Adam > > On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez > jhernandez@wikimedia.org wrote: >> >> I'd be up to it if we manage to cram it up in a following sprint and >> it is >> worth it. >> >> We could run a controlled test against production with a long batch >> of >> articles and check median/percentiles response time with repeated >> runs and >> highlight the different results for human inspection regarding >> quality. >> >> It's been noted previously that the results are far from ideal >> (which they >> are because it is just morelike), and I think it would be a great >> idea to >> change the endpoint to a specific one that is smarter and has some >> cache (we >> could do much more to get relevant results besides text similarity, >> take >> into account links, or see also links if there are, etc...). >> >> As a note, in mobile web the related articles extension allows >> editors to >> specify articles to show in the section, which would avoid queries >> to >> cirrussearch if it was more used (once rolled into stable I guess). >> >> I remember that the performance related task was closed as resolved >> (https://phabricator.wikimedia.org/T121254#1907192), should we >> reopen it or >> create a new one? >> >> I'm not sure if we ended up adding the smaxage parameter (I think we >> didn't), should we? To me it seems a no-brainer that we should be >> caching >> this results in varnish since they don't need to be completely up to >> date >> for this use case. >> >> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >> ebernhardson@wikimedia.org wrote: >>> >>> Both mobile apps and web are using CirrusSearch's morelike: feature >>> which >>> is showing some performance issues on our end. We would like to >>> make a >>> performance optimization to it, but before we would prefer to run >>> an A/B >>> test to see if the results are still "about as good" as they are >>> currently. >>> >>> The optimization is basically: Currently more like this takes the >>> entire >>> article into account, we would like to change this to take only the >>> opening >>> text of an article into account. This should reduce the amount of >>> work we >>> have to do on the backend saving both server load and latency the >>> user sees >>> running the query. >>> >>> This can be triggered by adding these two query parameters to the >>> search >>> api request that is being performed: >>> >>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>> >>> >>> The API will give a warning that these parameters do not exist, but >>> they >>> are safe to ignore. Would any of you be willing to run this test? >>> We would >>> basically want to look at user perceived latency along with click >>> through >>> rates for the current default setup along with the restricted setup >>> using >>> only opening_text. >>> >>> Erik B. >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.
On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.
On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think
if
we're doing near term experimentation with a controlled A/B test the
Android
app is the only logical place to start. Dmitry, can that work for you?
It's
not required, but I think it would be neat to see if we can move the
needle
even more. Of course your quarterly goals take top priority...but what do you think?
On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your
replies!
On Friday, January 22, 2016, Erik Bernhardson <
ebernhardson@wikimedia.org>
wrote:
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
Regarding the caching, we would need to agree between apps and web
about
the url and smaxage parameter as Adam noted so that the urls are
exactly the
same to not bloat varnish and reuse the same cached objects across platforms.
It is an extremely adhoc and brittle solution but seems like it would
be
the greatest win.
20% of the traffic from searches by being only in android and web beta seems a lot to me, and we should work on reducing it, otherwise when
it hits
web stable we're going to crush the servers, so caching seems the
highest
priority.
To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)
Let's chime in https://phabricator.wikimedia.org/T124216 and continue the cache discussion there.
Regarding the validity of results with opening text only, how should
we
proceed? Adam?
I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click
through
rates for the two approaches.
On Wed, Jan 20, 2016 at 9:34 PM, David Causse dcausse@wikimedia.org wrote:
Hi,
Yes we can combine many factors, from templates (quality but also disambiguation/stubs), size and others. Today cirrus uses mostly the number of incoming links which (imho) is not very good for morelike. On enwiki results will also be scored according the weights defined
in
https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates
.
I wrote a small bash to compare results : https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad Here is some random results from the list (Semetimes better,
sometimes
worse) :
$ sh morelike.sh Revolution_Muslim Defaults "title": "Chess", "title": "Suicide attack", "title": "Zachary Adam Chesser", ======= Opening text no boost links "title": "Hungarian Revolution of 1956", "title": "Muslims for America", "title": "Salafist Front",
$ sh morelike.sh Chesser Defaults "title": "Chess", "title": "Edinburgh", "title": "Edinburgh Corn Exchange", ======= Opening text no boost links "title": "Dreghorn Barracks", "title": "Edinburgh Chess Club", "title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29 Defaults "title": "Atlantis: The Lost Empire", "title": "Stargate", "title": "Stargate SG-1", ======= Opening text no boost links "title": "Father Time (disambiguation)", "title": "The Last Time", "title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit : > > I'm actually interested to see whether this yields better results
in
> certain examples where the algorithm is lacking [1]. If it's done as > an A/B test we could even measure things such as click throughs in
the
> related article feature (whether they go up or not) > > Out of interest is it also possible to take article size and type
into
> account and not returning any morelike results for things like > disambiguation pages and stubs? > > [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya > > > On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org > wrote: >> >> One thing we could do regarding the quality of the output is check >> results >> against a random sample of popular articles (example approach to
find
>> some >> articles) on mdot Wikipedia. Presuming that improves the quality of >> the >> recommendations or at least does not degrade them, we should
consider
>> adding >> the enhancement task to a future sprint, with further
instrumentation
>> and >> A/B testing / timeboxed beta test, etc. >> >> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good >> fix for >> now for further reduction of client perceived wait, at least for >> non-cold >> cache requests, even if we stop beating up the backend. Does anyone >> know of >> a compelling reason to not do that for the time being? The main
thing
>> that >> comes to mind as always is growing the Varnish cache object pool - >> probably >> not a huge deal while the thing is only in beta, but on the stable >> channel >> maybe noteworthy because it would run on probably most pages (but >> that's >> what edge caches are for, after all). >> >> Erik, from your perspective does use of smaxage relieve the backend >> sufficiently? >> >> If we do smaxage, then Web, Android, iOS should standardize their >> URLs so we >> get more cache hits at the edge across all clients. Here's the URL
I
>> see >> being used on the web today from mobile web beta: >> >> >>
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
>> >> >> -Adam >> >> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >> jhernandez@wikimedia.org wrote: >>> >>> I'd be up to it if we manage to cram it up in a following sprint
and
>>> it is >>> worth it. >>> >>> We could run a controlled test against production with a long
batch
>>> of >>> articles and check median/percentiles response time with repeated >>> runs and >>> highlight the different results for human inspection regarding >>> quality. >>> >>> It's been noted previously that the results are far from ideal >>> (which they >>> are because it is just morelike), and I think it would be a great >>> idea to >>> change the endpoint to a specific one that is smarter and has some >>> cache (we >>> could do much more to get relevant results besides text
similarity,
>>> take >>> into account links, or see also links if there are, etc...). >>> >>> As a note, in mobile web the related articles extension allows >>> editors to >>> specify articles to show in the section, which would avoid queries >>> to >>> cirrussearch if it was more used (once rolled into stable I
guess).
>>> >>> I remember that the performance related task was closed as
resolved
>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>> reopen it or >>> create a new one? >>> >>> I'm not sure if we ended up adding the smaxage parameter (I think
we
>>> didn't), should we? To me it seems a no-brainer that we should be >>> caching >>> this results in varnish since they don't need to be completely up
to
>>> date >>> for this use case. >>> >>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>> ebernhardson@wikimedia.org wrote: >>>> >>>> Both mobile apps and web are using CirrusSearch's morelike:
feature
>>>> which >>>> is showing some performance issues on our end. We would like to >>>> make a >>>> performance optimization to it, but before we would prefer to run >>>> an A/B >>>> test to see if the results are still "about as good" as they are >>>> currently. >>>> >>>> The optimization is basically: Currently more like this takes the >>>> entire >>>> article into account, we would like to change this to take only
the
>>>> opening >>>> text of an article into account. This should reduce the amount of >>>> work we >>>> have to do on the backend saving both server load and latency the >>>> user sees >>>> running the query. >>>> >>>> This can be triggered by adding these two query parameters to the >>>> search >>>> api request that is being performed: >>>> >>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>> >>>> >>>> The API will give a warning that these parameters do not exist,
but
>>>> they >>>> are safe to ignore. Would any of you be willing to run this test? >>>> We would >>>> basically want to look at user perceived latency along with click >>>> through >>>> rates for the current default setup along with the restricted
setup
>>>> using >>>> only opening_text. >>>> >>>> Erik B. >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Just a quick note that our latest production release (just published) contains this A/B test, in addition to the other updates. Looking forward to seeing the numbers from this!
-Dmitry
On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant dbrant@wikimedia.org wrote:
Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.
On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.
On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think
if
we're doing near term experimentation with a controlled A/B test the
Android
app is the only logical place to start. Dmitry, can that work for you?
It's
not required, but I think it would be neat to see if we can move the
needle
even more. Of course your quarterly goals take top priority...but what
do
you think?
On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org wrote:
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your
replies!
On Friday, January 22, 2016, Erik Bernhardson <
ebernhardson@wikimedia.org>
wrote:
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote:
Regarding the caching, we would need to agree between apps and web
about
the url and smaxage parameter as Adam noted so that the urls are
exactly the
same to not bloat varnish and reuse the same cached objects across platforms.
It is an extremely adhoc and brittle solution but seems like it
would be
the greatest win.
20% of the traffic from searches by being only in android and web
beta
seems a lot to me, and we should work on reducing it, otherwise when
it hits
web stable we're going to crush the servers, so caching seems the
highest
priority.
To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)
Let's chime in https://phabricator.wikimedia.org/T124216 and
continue
the cache discussion there.
Regarding the validity of results with opening text only, how should
we
proceed? Adam?
I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click
through
rates for the two approaches.
On Wed, Jan 20, 2016 at 9:34 PM, David Causse <dcausse@wikimedia.org
wrote: > > Hi, > > Yes we can combine many factors, from templates (quality but also > disambiguation/stubs), size and others. > Today cirrus uses mostly the number of incoming links which (imho)
is
> not very good for morelike. > On enwiki results will also be scored according the weights defined
in
>
https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
> > I wrote a small bash to compare results : > https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad > Here is some random results from the list (Semetimes better,
sometimes
> worse) : > > $ sh morelike.sh Revolution_Muslim > Defaults > "title": "Chess", > "title": "Suicide attack", > "title": "Zachary Adam Chesser", > ======= > Opening text no boost links > "title": "Hungarian Revolution of 1956", > "title": "Muslims for America", > "title": "Salafist Front", > > $ sh morelike.sh Chesser > Defaults > "title": "Chess", > "title": "Edinburgh", > "title": "Edinburgh Corn Exchange", > ======= > Opening text no boost links > "title": "Dreghorn Barracks", > "title": "Edinburgh Chess Club", > "title": "Threipmuir Reservoir", > > $ sh morelike.sh Time_%28disambiguation%29 > Defaults > "title": "Atlantis: The Lost Empire", > "title": "Stargate", > "title": "Stargate SG-1", > ======= > Opening text no boost links > "title": "Father Time (disambiguation)", > "title": "The Last Time", > "title": "Time After Time", > > > > > > Le 20/01/2016 19:34, Jon Robson a écrit : >> >> I'm actually interested to see whether this yields better results
in
>> certain examples where the algorithm is lacking [1]. If it's done
as
>> an A/B test we could even measure things such as click throughs in
the
>> related article feature (whether they go up or not) >> >> Out of interest is it also possible to take article size and type
into
>> account and not returning any morelike results for things like >> disambiguation pages and stubs? >> >> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >> >> >> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org >> wrote: >>> >>> One thing we could do regarding the quality of the output is check >>> results >>> against a random sample of popular articles (example approach to
find
>>> some >>> articles) on mdot Wikipedia. Presuming that improves the quality
of
>>> the >>> recommendations or at least does not degrade them, we should
consider
>>> adding >>> the enhancement task to a future sprint, with further
instrumentation
>>> and >>> A/B testing / timeboxed beta test, etc. >>> >>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good >>> fix for >>> now for further reduction of client perceived wait, at least for >>> non-cold >>> cache requests, even if we stop beating up the backend. Does
anyone
>>> know of >>> a compelling reason to not do that for the time being? The main
thing
>>> that >>> comes to mind as always is growing the Varnish cache object pool - >>> probably >>> not a huge deal while the thing is only in beta, but on the stable >>> channel >>> maybe noteworthy because it would run on probably most pages (but >>> that's >>> what edge caches are for, after all). >>> >>> Erik, from your perspective does use of smaxage relieve the
backend
>>> sufficiently? >>> >>> If we do smaxage, then Web, Android, iOS should standardize their >>> URLs so we >>> get more cache hits at the edge across all clients. Here's the
URL I
>>> see >>> being used on the web today from mobile web beta: >>> >>> >>>
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
>>> >>> >>> -Adam >>> >>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>> jhernandez@wikimedia.org wrote: >>>> >>>> I'd be up to it if we manage to cram it up in a following sprint
and
>>>> it is >>>> worth it. >>>> >>>> We could run a controlled test against production with a long
batch
>>>> of >>>> articles and check median/percentiles response time with repeated >>>> runs and >>>> highlight the different results for human inspection regarding >>>> quality. >>>> >>>> It's been noted previously that the results are far from ideal >>>> (which they >>>> are because it is just morelike), and I think it would be a great >>>> idea to >>>> change the endpoint to a specific one that is smarter and has
some
>>>> cache (we >>>> could do much more to get relevant results besides text
similarity,
>>>> take >>>> into account links, or see also links if there are, etc...). >>>> >>>> As a note, in mobile web the related articles extension allows >>>> editors to >>>> specify articles to show in the section, which would avoid
queries
>>>> to >>>> cirrussearch if it was more used (once rolled into stable I
guess).
>>>> >>>> I remember that the performance related task was closed as
resolved
>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>>> reopen it or >>>> create a new one? >>>> >>>> I'm not sure if we ended up adding the smaxage parameter (I
think we
>>>> didn't), should we? To me it seems a no-brainer that we should be >>>> caching >>>> this results in varnish since they don't need to be completely
up to
>>>> date >>>> for this use case. >>>> >>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>> ebernhardson@wikimedia.org wrote: >>>>> >>>>> Both mobile apps and web are using CirrusSearch's morelike:
feature
>>>>> which >>>>> is showing some performance issues on our end. We would like to >>>>> make a >>>>> performance optimization to it, but before we would prefer to
run
>>>>> an A/B >>>>> test to see if the results are still "about as good" as they are >>>>> currently. >>>>> >>>>> The optimization is basically: Currently more like this takes
the
>>>>> entire >>>>> article into account, we would like to change this to take only
the
>>>>> opening >>>>> text of an article into account. This should reduce the amount
of
>>>>> work we >>>>> have to do on the backend saving both server load and latency
the
>>>>> user sees >>>>> running the query. >>>>> >>>>> This can be triggered by adding these two query parameters to
the
>>>>> search >>>>> api request that is being performed: >>>>> >>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>> >>>>> >>>>> The API will give a warning that these parameters do not exist,
but
>>>>> they >>>>> are safe to ignore. Would any of you be willing to run this
test?
>>>>> We would >>>>> basically want to look at user perceived latency along with
click
>>>>> through >>>>> rates for the current default setup along with the restricted
setup
>>>>> using >>>>> only opening_text. >>>>> >>>>> Erik B. >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l > > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Hi, Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.
I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles. -J
On Mon, Feb 15, 2016 at 11:26 AM, Dmitry Brant dbrant@wikimedia.org wrote:
Just a quick note that our latest production release (just published) contains this A/B test, in addition to the other updates. Looking forward to seeing the numbers from this!
-Dmitry
On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant dbrant@wikimedia.org wrote:
Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.
On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.
On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I
think if
we're doing near term experimentation with a controlled A/B test the
Android
app is the only logical place to start. Dmitry, can that work for you?
It's
not required, but I think it would be neat to see if we can move the
needle
even more. Of course your quarterly goals take top priority...but what
do
you think?
On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org
wrote:
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your
replies!
On Friday, January 22, 2016, Erik Bernhardson <
ebernhardson@wikimedia.org>
wrote:
On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez jhernandez@wikimedia.org wrote: > > Regarding the caching, we would need to agree between apps and web
about
> the url and smaxage parameter as Adam noted so that the urls are
exactly the
> same to not bloat varnish and reuse the same cached objects across > platforms. > > It is an extremely adhoc and brittle solution but seems like it
would be
> the greatest win. > > 20% of the traffic from searches by being only in android and web
beta
> seems a lot to me, and we should work on reducing it, otherwise
when it hits
> web stable we're going to crush the servers, so caching seems the
highest
> priority. > To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :)
> > Let's chime in https://phabricator.wikimedia.org/T124216 and
continue
> the cache discussion there. > > Regarding the validity of results with opening text only, how
should we
> proceed? Adam? > I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click
through
rates for the two approaches.
> > On Wed, Jan 20, 2016 at 9:34 PM, David Causse <
dcausse@wikimedia.org>
> wrote: >> >> Hi, >> >> Yes we can combine many factors, from templates (quality but also >> disambiguation/stubs), size and others. >> Today cirrus uses mostly the number of incoming links which (imho)
is
>> not very good for morelike. >> On enwiki results will also be scored according the weights
defined in
>>
https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
>> >> I wrote a small bash to compare results : >> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad >> Here is some random results from the list (Semetimes better,
sometimes
>> worse) : >> >> $ sh morelike.sh Revolution_Muslim >> Defaults >> "title": "Chess", >> "title": "Suicide attack", >> "title": "Zachary Adam Chesser", >> ======= >> Opening text no boost links >> "title": "Hungarian Revolution of 1956", >> "title": "Muslims for America", >> "title": "Salafist Front", >> >> $ sh morelike.sh Chesser >> Defaults >> "title": "Chess", >> "title": "Edinburgh", >> "title": "Edinburgh Corn Exchange", >> ======= >> Opening text no boost links >> "title": "Dreghorn Barracks", >> "title": "Edinburgh Chess Club", >> "title": "Threipmuir Reservoir", >> >> $ sh morelike.sh Time_%28disambiguation%29 >> Defaults >> "title": "Atlantis: The Lost Empire", >> "title": "Stargate", >> "title": "Stargate SG-1", >> ======= >> Opening text no boost links >> "title": "Father Time (disambiguation)", >> "title": "The Last Time", >> "title": "Time After Time", >> >> >> >> >> >> Le 20/01/2016 19:34, Jon Robson a écrit : >>> >>> I'm actually interested to see whether this yields better
results in
>>> certain examples where the algorithm is lacking [1]. If it's done
as
>>> an A/B test we could even measure things such as click throughs
in the
>>> related article feature (whether they go up or not) >>> >>> Out of interest is it also possible to take article size and type
into
>>> account and not returning any morelike results for things like >>> disambiguation pages and stubs? >>> >>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >>> >>> >>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org >>> wrote: >>>> >>>> One thing we could do regarding the quality of the output is
check
>>>> results >>>> against a random sample of popular articles (example approach to
find
>>>> some >>>> articles) on mdot Wikipedia. Presuming that improves the quality
of
>>>> the >>>> recommendations or at least does not degrade them, we should
consider
>>>> adding >>>> the enhancement task to a future sprint, with further
instrumentation
>>>> and >>>> A/B testing / timeboxed beta test, etc. >>>> >>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a
good
>>>> fix for >>>> now for further reduction of client perceived wait, at least for >>>> non-cold >>>> cache requests, even if we stop beating up the backend. Does
anyone
>>>> know of >>>> a compelling reason to not do that for the time being? The main
thing
>>>> that >>>> comes to mind as always is growing the Varnish cache object pool
>>>> probably >>>> not a huge deal while the thing is only in beta, but on the
stable
>>>> channel >>>> maybe noteworthy because it would run on probably most pages (but >>>> that's >>>> what edge caches are for, after all). >>>> >>>> Erik, from your perspective does use of smaxage relieve the
backend
>>>> sufficiently? >>>> >>>> If we do smaxage, then Web, Android, iOS should standardize their >>>> URLs so we >>>> get more cache hits at the edge across all clients. Here's the
URL I
>>>> see >>>> being used on the web today from mobile web beta: >>>> >>>> >>>>
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
>>>> >>>> >>>> -Adam >>>> >>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>>> jhernandez@wikimedia.org wrote: >>>>> >>>>> I'd be up to it if we manage to cram it up in a following
sprint and
>>>>> it is >>>>> worth it. >>>>> >>>>> We could run a controlled test against production with a long
batch
>>>>> of >>>>> articles and check median/percentiles response time with
repeated
>>>>> runs and >>>>> highlight the different results for human inspection regarding >>>>> quality. >>>>> >>>>> It's been noted previously that the results are far from ideal >>>>> (which they >>>>> are because it is just morelike), and I think it would be a
great
>>>>> idea to >>>>> change the endpoint to a specific one that is smarter and has
some
>>>>> cache (we >>>>> could do much more to get relevant results besides text
similarity,
>>>>> take >>>>> into account links, or see also links if there are, etc...). >>>>> >>>>> As a note, in mobile web the related articles extension allows >>>>> editors to >>>>> specify articles to show in the section, which would avoid
queries
>>>>> to >>>>> cirrussearch if it was more used (once rolled into stable I
guess).
>>>>> >>>>> I remember that the performance related task was closed as
resolved
>>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>>>> reopen it or >>>>> create a new one? >>>>> >>>>> I'm not sure if we ended up adding the smaxage parameter (I
think we
>>>>> didn't), should we? To me it seems a no-brainer that we should
be
>>>>> caching >>>>> this results in varnish since they don't need to be completely
up to
>>>>> date >>>>> for this use case. >>>>> >>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>>> ebernhardson@wikimedia.org wrote: >>>>>> >>>>>> Both mobile apps and web are using CirrusSearch's morelike:
feature
>>>>>> which >>>>>> is showing some performance issues on our end. We would like to >>>>>> make a >>>>>> performance optimization to it, but before we would prefer to
run
>>>>>> an A/B >>>>>> test to see if the results are still "about as good" as they
are
>>>>>> currently. >>>>>> >>>>>> The optimization is basically: Currently more like this takes
the
>>>>>> entire >>>>>> article into account, we would like to change this to take
only the
>>>>>> opening >>>>>> text of an article into account. This should reduce the amount
of
>>>>>> work we >>>>>> have to do on the backend saving both server load and latency
the
>>>>>> user sees >>>>>> running the query. >>>>>> >>>>>> This can be triggered by adding these two query parameters to
the
>>>>>> search >>>>>> api request that is being performed: >>>>>> >>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>>> >>>>>> >>>>>> The API will give a warning that these parameters do not
exist, but
>>>>>> they >>>>>> are safe to ignore. Would any of you be willing to run this
test?
>>>>>> We would >>>>>> basically want to look at user perceived latency along with
click
>>>>>> through >>>>>> rates for the current default setup along with the restricted
setup
>>>>>> using >>>>>> only opening_text. >>>>>> >>>>>> Erik B. >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l > > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l >
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
The more like code lives in elasticsearch, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ml... gives a decent rundown of the various parameters available. The defaults we currently use are at https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/C...
These can be overridden with a custom page on the wiki at MediaWiki:cirrussearch-morelikethis-settings. I can't suggest editors should tune this on their own though, it requires careful testing to see what changes do. The same options can also be overridden at query time via a series of internal test-only paremeters implemented at https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/i...
On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:
Hi, Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.
I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles. -J
On Mon, Feb 15, 2016 at 11:26 AM, Dmitry Brant dbrant@wikimedia.org wrote:
Just a quick note that our latest production release (just published) contains this A/B test, in addition to the other updates. Looking forward to seeing the numbers from this!
-Dmitry
On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant dbrant@wikimedia.org wrote:
Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.
On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.
On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso abaso@wikimedia.org wrote:
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I
think if
we're doing near term experimentation with a controlled A/B test the
Android
app is the only logical place to start. Dmitry, can that work for
you? It's
not required, but I think it would be neat to see if we can move the
needle
even more. Of course your quarterly goals take top priority...but
what do
you think?
On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso abaso@wikimedia.org
wrote:
Hey all, am planning to look at Phabricator tasks and provide a reply during the upcoming weekdays. Just wanted to acknowledge I saw your
replies!
On Friday, January 22, 2016, Erik Bernhardson <
ebernhardson@wikimedia.org>
wrote: > > On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez > jhernandez@wikimedia.org wrote: >> >> Regarding the caching, we would need to agree between apps and web
about
>> the url and smaxage parameter as Adam noted so that the urls are
exactly the
>> same to not bloat varnish and reuse the same cached objects across >> platforms. >> >> It is an extremely adhoc and brittle solution but seems like it
would be
>> the greatest win. >> >> 20% of the traffic from searches by being only in android and web
beta
>> seems a lot to me, and we should work on reducing it, otherwise
when it hits
>> web stable we're going to crush the servers, so caching seems the
highest
>> priority. >> > To clarify its 20% of the load, as opposed to 20% of the traffic.
But
> same difference :) > >> >> Let's chime in https://phabricator.wikimedia.org/T124216 and
continue
>> the cache discussion there. >> >> Regarding the validity of results with opening text only, how
should we
>> proceed? Adam? >> > I've put together https://phabricator.wikimedia.org/T124258 to
track
> putting together an AB test that measures the difference in click
through
> rates for the two approaches. > > >> >> On Wed, Jan 20, 2016 at 9:34 PM, David Causse <
dcausse@wikimedia.org>
>> wrote: >>> >>> Hi, >>> >>> Yes we can combine many factors, from templates (quality but also >>> disambiguation/stubs), size and others. >>> Today cirrus uses mostly the number of incoming links which
(imho) is
>>> not very good for morelike. >>> On enwiki results will also be scored according the weights
defined in
>>>
https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
>>> >>> I wrote a small bash to compare results : >>> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad >>> Here is some random results from the list (Semetimes better,
sometimes
>>> worse) : >>> >>> $ sh morelike.sh Revolution_Muslim >>> Defaults >>> "title": "Chess", >>> "title": "Suicide attack", >>> "title": "Zachary Adam Chesser", >>> ======= >>> Opening text no boost links >>> "title": "Hungarian Revolution of 1956", >>> "title": "Muslims for America", >>> "title": "Salafist Front", >>> >>> $ sh morelike.sh Chesser >>> Defaults >>> "title": "Chess", >>> "title": "Edinburgh", >>> "title": "Edinburgh Corn Exchange", >>> ======= >>> Opening text no boost links >>> "title": "Dreghorn Barracks", >>> "title": "Edinburgh Chess Club", >>> "title": "Threipmuir Reservoir", >>> >>> $ sh morelike.sh Time_%28disambiguation%29 >>> Defaults >>> "title": "Atlantis: The Lost Empire", >>> "title": "Stargate", >>> "title": "Stargate SG-1", >>> ======= >>> Opening text no boost links >>> "title": "Father Time (disambiguation)", >>> "title": "The Last Time", >>> "title": "Time After Time", >>> >>> >>> >>> >>> >>> Le 20/01/2016 19:34, Jon Robson a écrit : >>>> >>>> I'm actually interested to see whether this yields better
results in
>>>> certain examples where the algorithm is lacking [1]. If it's
done as
>>>> an A/B test we could even measure things such as click throughs
in the
>>>> related article feature (whether they go up or not) >>>> >>>> Out of interest is it also possible to take article size and
type into
>>>> account and not returning any morelike results for things like >>>> disambiguation pages and stubs? >>>> >>>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >>>> >>>> >>>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso abaso@wikimedia.org >>>> wrote: >>>>> >>>>> One thing we could do regarding the quality of the output is
check
>>>>> results >>>>> against a random sample of popular articles (example approach
to find
>>>>> some >>>>> articles) on mdot Wikipedia. Presuming that improves the
quality of
>>>>> the >>>>> recommendations or at least does not degrade them, we should
consider
>>>>> adding >>>>> the enhancement task to a future sprint, with further
instrumentation
>>>>> and >>>>> A/B testing / timeboxed beta test, etc. >>>>> >>>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a
good
>>>>> fix for >>>>> now for further reduction of client perceived wait, at least for >>>>> non-cold >>>>> cache requests, even if we stop beating up the backend. Does
anyone
>>>>> know of >>>>> a compelling reason to not do that for the time being? The main
thing
>>>>> that >>>>> comes to mind as always is growing the Varnish cache object
pool -
>>>>> probably >>>>> not a huge deal while the thing is only in beta, but on the
stable
>>>>> channel >>>>> maybe noteworthy because it would run on probably most pages
(but
>>>>> that's >>>>> what edge caches are for, after all). >>>>> >>>>> Erik, from your perspective does use of smaxage relieve the
backend
>>>>> sufficiently? >>>>> >>>>> If we do smaxage, then Web, Android, iOS should standardize
their
>>>>> URLs so we >>>>> get more cache hits at the edge across all clients. Here's the
URL I
>>>>> see >>>>> being used on the web today from mobile web beta: >>>>> >>>>> >>>>>
https://en.m.wikipedia.org/w/api.php?action=query&format=json&format...
>>>>> >>>>> >>>>> -Adam >>>>> >>>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>>>> jhernandez@wikimedia.org wrote: >>>>>> >>>>>> I'd be up to it if we manage to cram it up in a following
sprint and
>>>>>> it is >>>>>> worth it. >>>>>> >>>>>> We could run a controlled test against production with a long
batch
>>>>>> of >>>>>> articles and check median/percentiles response time with
repeated
>>>>>> runs and >>>>>> highlight the different results for human inspection regarding >>>>>> quality. >>>>>> >>>>>> It's been noted previously that the results are far from ideal >>>>>> (which they >>>>>> are because it is just morelike), and I think it would be a
great
>>>>>> idea to >>>>>> change the endpoint to a specific one that is smarter and has
some
>>>>>> cache (we >>>>>> could do much more to get relevant results besides text
similarity,
>>>>>> take >>>>>> into account links, or see also links if there are, etc...). >>>>>> >>>>>> As a note, in mobile web the related articles extension allows >>>>>> editors to >>>>>> specify articles to show in the section, which would avoid
queries
>>>>>> to >>>>>> cirrussearch if it was more used (once rolled into stable I
guess).
>>>>>> >>>>>> I remember that the performance related task was closed as
resolved
>>>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>>>>> reopen it or >>>>>> create a new one? >>>>>> >>>>>> I'm not sure if we ended up adding the smaxage parameter (I
think we
>>>>>> didn't), should we? To me it seems a no-brainer that we should
be
>>>>>> caching >>>>>> this results in varnish since they don't need to be completely
up to
>>>>>> date >>>>>> for this use case. >>>>>> >>>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>>>> ebernhardson@wikimedia.org wrote: >>>>>>> >>>>>>> Both mobile apps and web are using CirrusSearch's morelike:
feature
>>>>>>> which >>>>>>> is showing some performance issues on our end. We would like
to
>>>>>>> make a >>>>>>> performance optimization to it, but before we would prefer to
run
>>>>>>> an A/B >>>>>>> test to see if the results are still "about as good" as they
are
>>>>>>> currently. >>>>>>> >>>>>>> The optimization is basically: Currently more like this takes
the
>>>>>>> entire >>>>>>> article into account, we would like to change this to take
only the
>>>>>>> opening >>>>>>> text of an article into account. This should reduce the
amount of
>>>>>>> work we >>>>>>> have to do on the backend saving both server load and latency
the
>>>>>>> user sees >>>>>>> running the query. >>>>>>> >>>>>>> This can be triggered by adding these two query parameters to
the
>>>>>>> search >>>>>>> api request that is being performed: >>>>>>> >>>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>>>> >>>>>>> >>>>>>> The API will give a warning that these parameters do not
exist, but
>>>>>>> they >>>>>>> are safe to ignore. Would any of you be willing to run this
test?
>>>>>>> We would >>>>>>> basically want to look at user perceived latency along with
click
>>>>>>> through >>>>>>> rates for the current default setup along with the restricted
setup
>>>>>>> using >>>>>>> only opening_text. >>>>>>> >>>>>>> Erik B. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mobile-l mailing list >>>>>>> Mobile-l@lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:
Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.
I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles.
"More like" is an Elasticsearch https://en.wikipedia.org/wiki/Elasticsearch feature; the documentation is here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html. I'd imagine the source code is way too complicated to give much insight to the casual reader (as Elasticsearch is a large and complex piece of software) but I never looked into the ES codebase so that's just a guess. The configuration we use for morelike queries is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450. The wrapper code that fires the ES query is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800 (but at a glance it doesn't do anything interesting).
Thanks both! This clarifies a lot. I think the primary issue that editors had raised and I had hoped to explore was popularity/importance v. obscurity.
Specifically, there have been concerns that the results tilt towards more popular articles (here https://www.mediawiki.org/wiki/Topic:Swjyfj59pkjfol7m and here https://www.mediawiki.org/wiki/Topic:Sxy84nxinxqqld2i), but it seems that page traffic is not a variable. Instead, what seems to be happening is that the raw # of similar terms is being used, rather than the % of similar terms. This means that longer articles are favored. Is that a fair assessment?
-J
On Thu, Feb 18, 2016 at 4:15 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:
Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.
I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles.
"More like" is an Elasticsearch https://en.wikipedia.org/wiki/Elasticsearch feature; the documentation is here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html. I'd imagine the source code is way too complicated to give much insight to the casual reader (as Elasticsearch is a large and complex piece of software) but I never looked into the ES codebase so that's just a guess. The configuration we use for morelike queries is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450. The wrapper code that fires the ES query is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800 (but at a glance it doesn't do anything interesting).
There is a popularity factor at work, All CirrusSearch queries take into account the number of incoming links as part of a rescore on a few thousand of the top results.
There are a few ways we can tweak this. All of the examples below use internal testing query parameters, i can't suggest using these as part of normal production usage outside of A/B testing, but they work well for exploring variations
query patterns used: 'opening text no boost links': '?search=morelike:%s&cirrusBoostLinks=no&cirrusMltUseFields=yes&cirrusMltFields=opening_text', 'opening text': '?search=morelike:%s&cirrusMltUseFields=yes&cirrusMltFields=opening_text', 'no boost links': '?search=morelike:%s&cirrusBoostLinks=no', 'basic': '?search=morelike:%s',
Test output: A_Summer_Bird-Cage: basic I Know Why the Caged Bird Sings Princess Louise, Duchess of Argyll J. K. Rowling
opening text I Know Why the Caged Bird Sings Themes in Maya Angelou's autobiographies Abnormal behaviour of birds in captivity
opening text no boost links Themes in Maya Angelou's autobiographies Get Sexy I Know Why the Caged Bird Sings
no boost links I Know Why the Caged Bird Sings Jerusalem the Golden Princess Louise, Duchess of Argyll
Isabel_Fonseca: basic Emma Goldman Martin Amis J. K. Rowling
opening text I Know Why the Caged Bird Sings Kate Millett Hillary Clinton
opening text no boost links I Know Why the Caged Bird Sings Mary Beth Keane Elizabeth F. Ellet
no boost links Martin Amis Margaret Fuller Emma Goldman
Andrew_Michael_Hurley: basic J. K. Rowling Enid Blyton Ernest Shackleton
opening text List of James Bond novels and short stories Harry Potter James Bond
opening text no boost links List of James Bond novels and short stories Childhood's End Deborah Swift
no boost links Pure (Miller novel) The Other Hand Stella Gibbons
The_Queen_of_the_Tearling: basic Emma Watson J. K. Rowling Emma Goldman
opening text The Sun Also Rises The Twilight Saga The Historian
opening text no boost links List of Buffyverse novels Witz (novel)
It's very hard to pick and choose a few small samples of queries and say "this is now better". I highly suggest, at a minimum, A/B testing variations and basing results on user click through and bounce rates. Back testing thousands of user queries and comparing them to user click through or satisfaction (clickthrough + dwell) might be much more useful.
On Thu, Feb 18, 2016 at 4:29 PM, Jon Katz jkatz@wikimedia.org wrote:
Thanks both! This clarifies a lot. I think the primary issue that editors had raised and I had hoped to explore was popularity/importance v. obscurity.
Specifically, there have been concerns that the results tilt towards more popular articles (here https://www.mediawiki.org/wiki/Topic:Swjyfj59pkjfol7m and here https://www.mediawiki.org/wiki/Topic:Sxy84nxinxqqld2i), but it seems that page traffic is not a variable. Instead, what seems to be happening is that the raw # of similar terms is being used, rather than the % of similar terms. This means that longer articles are favored. Is that a fair assessment?
-J
On Thu, Feb 18, 2016 at 4:15 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Thu, Feb 18, 2016 at 4:00 PM, Jon Katz jkatz@wikimedia.org wrote:
Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.
I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles.
"More like" is an Elasticsearch https://en.wikipedia.org/wiki/Elasticsearch feature; the documentation is here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html. I'd imagine the source code is way too complicated to give much insight to the casual reader (as Elasticsearch is a large and complex piece of software) but I never looked into the ES codebase so that's just a guess. The configuration we use for morelike queries is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/CirrusSearch.php#L450. The wrapper code that fires the ES query is here https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/867248ccf522541922507f23a9ddd0783bed3699/includes/Searcher.php#L800 (but at a glance it doesn't do anything interesting).
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Thu, Feb 18, 2016 at 5:01 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
Back testing thousands of user queries and comparing them to user click through or satisfaction (clickthrough + dwell)
Thanks, Erik! This is very helpful. What do you mean by 'back testing'?
Also, even without boost links, there seems to be a bias towards popular (long pages). it seems that a focus on # of words in common rather than % is one of the things leading to long articles seeing so much more traction - would this be an easy thing to test as well? -J
On 19 February 2016 at 17:13, Jon Katz jkatz@wikimedia.org wrote:
Thanks, Erik! This is very helpful. What do you mean by 'back testing'?
For search, there's a few different approaches for quantitative testing that are less difficult than A/B testing in terms of development overhead, data analysis and coordination. One of those is to replay real user queries against the index, but run the query with slightly different parameters from original. This is super cheap compared to an A/B test, but the downside is that it can only answer really deterministic (for lack of a better word) things, like how the parameters affect the zero-results rate or result ordering; since there's no user interaction with the replayed queries, you don't know what the clickthrough would've been, so it's hard to measure how satisfied the user would've been.
Hopefully that helps explain it.
Thanks, Dan
Great explanation. Thanks, Dan!
On Fri, Feb 19, 2016 at 5:19 PM, Dan Garry dgarry@wikimedia.org wrote:
On 19 February 2016 at 17:13, Jon Katz jkatz@wikimedia.org wrote:
Thanks, Erik! This is very helpful. What do you mean by 'back testing'?
For search, there's a few different approaches for quantitative testing that are less difficult than A/B testing in terms of development overhead, data analysis and coordination. One of those is to replay real user queries against the index, but run the query with slightly different parameters from original. This is super cheap compared to an A/B test, but the downside is that it can only answer really deterministic (for lack of a better word) things, like how the parameters affect the zero-results rate or result ordering; since there's no user interaction with the replayed queries, you don't know what the clickthrough would've been, so it's hard to measure how satisfied the user would've been.
Hopefully that helps explain it.
Thanks, Dan
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
Le 20/02/2016 02:13, Jon Katz a écrit :
Also, even without boost links, there seems to be a bias towards popular (long pages). it seems that a focus on # of words in common rather than % is one of the things leading to long articles seeing so much more traction - would this be an easy thing to test as well?
Hi,
you're right but I think it's because of the boost templates feature which is enabled even when boostlinks is not: on enwiki few templates are configured in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates which means that a featured article will be overboosted.
We could fine tune the core more like algorithm with various params but today I think that the rescore features (boostlinks, boost-templates) is what have the most impact.
To sum up, 2 types of score are combined when ranking articles: - A score that computes the similarity between documents, this can be fine-tuned[1] - A score (we call it "rescore") that uses article metadata: boostlinks, templates.
The way these scores are combined can be configured with a rescore profile, but today it's a product of all the scores, e.g.
morelike:A_Summer_Bird-Cage
The score for "I Know Why the Caged Bird Sings" with boost links is: - similarity: 0.3457441 (terms chosen: "from", "cage", "bird") - boostlinks: 2.807535 - boost-templates: 2 - total: 0.3457441 * 2.807535 * 2 => 1.9413773
[1]: https://www.mediawiki.org/wiki/Help:CirrusSearch#morelike: