Just a quick note that our latest production release (just published) contains this A/B test, in addition to the other updates.
Looking forward to seeing the numbers from this!

-Dmitry


On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant <dbrant@wikimedia.org> wrote:
Roger that! I think we could squeeze it in -- the change would be pretty straightforward. We'll be able to release a Beta with this A/B test in short order, but it will probably be a couple weeks until our next production release. I hope that's all right.


On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:
We are also happy to add cached entry points for high-traffic end
points in the REST API. I commented to that effect at
https://phabricator.wikimedia.org/T124216#1984206. Let us know if you
think this would be useful for this use case.

On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso <abaso@wikimedia.org> wrote:
> Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think if
> we're doing near term experimentation with a controlled A/B test the Android
> app is the only logical place to start. Dmitry, can that work for you? It's
> not required, but I think it would be neat to see if we can move the needle
> even more. Of course your quarterly goals take top priority...but what do
> you think?
>
> On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso <abaso@wikimedia.org> wrote:
>>
>> Hey all, am planning to look at Phabricator tasks and provide a reply
>> during the upcoming weekdays. Just wanted to acknowledge I saw your replies!
>>
>>
>> On Friday, January 22, 2016, Erik Bernhardson <ebernhardson@wikimedia.org>
>> wrote:
>>>
>>> On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez
>>> <jhernandez@wikimedia.org> wrote:
>>>>
>>>> Regarding the caching, we would need to agree between apps and web about
>>>> the url and smaxage parameter as Adam noted so that the urls are exactly the
>>>> same to not bloat varnish and reuse the same cached objects across
>>>> platforms.
>>>>
>>>> It is an extremely adhoc and brittle solution but seems like it would be
>>>> the greatest win.
>>>>
>>>> 20% of the traffic from searches by being only in android and web beta
>>>> seems a lot to me, and we should work on reducing it, otherwise when it hits
>>>> web stable we're going to crush the servers, so caching seems the highest
>>>> priority.
>>>>
>>> To clarify its 20% of the load, as opposed to 20% of the traffic. But
>>> same difference :)
>>>
>>>>
>>>> Let's chime in https://phabricator.wikimedia.org/T124216 and continue
>>>> the cache discussion there.
>>>>
>>>> Regarding the validity of results with opening text only, how should we
>>>> proceed? Adam?
>>>>
>>> I've put together https://phabricator.wikimedia.org/T124258 to track
>>> putting together an AB test that measures the difference in click through
>>> rates for the two approaches.
>>>
>>>
>>>>
>>>> On Wed, Jan 20, 2016 at 9:34 PM, David Causse <dcausse@wikimedia.org>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Yes we can combine many factors, from templates (quality but also
>>>>> disambiguation/stubs), size and others.
>>>>> Today cirrus uses mostly the number of incoming links which (imho) is
>>>>> not very good for morelike.
>>>>> On enwiki results will also be scored according the weights defined in
>>>>> https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
>>>>>
>>>>> I wrote a small bash to compare results :
>>>>> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad
>>>>> Here is some random results from the list (Semetimes better, sometimes
>>>>> worse) :
>>>>>
>>>>> $ sh morelike.sh Revolution_Muslim
>>>>> Defaults
>>>>>         "title": "Chess",
>>>>>         "title": "Suicide attack",
>>>>>         "title": "Zachary Adam Chesser",
>>>>> =======
>>>>> Opening text no boost links
>>>>>         "title": "Hungarian Revolution of 1956",
>>>>>         "title": "Muslims for America",
>>>>>         "title": "Salafist Front",
>>>>>
>>>>> $ sh morelike.sh Chesser
>>>>> Defaults
>>>>>         "title": "Chess",
>>>>>         "title": "Edinburgh",
>>>>>         "title": "Edinburgh Corn Exchange",
>>>>> =======
>>>>> Opening text no boost links
>>>>>         "title": "Dreghorn Barracks",
>>>>>         "title": "Edinburgh Chess Club",
>>>>>         "title": "Threipmuir Reservoir",
>>>>>
>>>>> $ sh morelike.sh Time_%28disambiguation%29
>>>>> Defaults
>>>>>         "title": "Atlantis: The Lost Empire",
>>>>>         "title": "Stargate",
>>>>>         "title": "Stargate SG-1",
>>>>> =======
>>>>> Opening text no boost links
>>>>>         "title": "Father Time (disambiguation)",
>>>>>         "title": "The Last Time",
>>>>>         "title": "Time After Time",
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 20/01/2016 19:34, Jon Robson a écrit :
>>>>>>
>>>>>> I'm actually  interested to see whether this yields better results in
>>>>>> certain examples where the algorithm is lacking [1]. If it's done as
>>>>>> an A/B test we could even measure things such as click throughs in the
>>>>>> related article feature (whether they go up or not)
>>>>>>
>>>>>> Out of interest is it also possible to take article size and type into
>>>>>> account and not returning any morelike results for things like
>>>>>> disambiguation pages and stubs?
>>>>>>
>>>>>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso <abaso@wikimedia.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> One thing we could do regarding the quality of the output is check
>>>>>>> results
>>>>>>> against a random sample of popular articles (example approach to find
>>>>>>> some
>>>>>>> articles) on mdot Wikipedia. Presuming that improves the quality of
>>>>>>> the
>>>>>>> recommendations or at least does not degrade them, we should consider
>>>>>>> adding
>>>>>>> the enhancement task to a future sprint, with further instrumentation
>>>>>>> and
>>>>>>> A/B testing / timeboxed beta test, etc.
>>>>>>>
>>>>>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good
>>>>>>> fix for
>>>>>>> now for further reduction of client perceived wait, at least for
>>>>>>> non-cold
>>>>>>> cache requests, even if we stop beating up the backend. Does anyone
>>>>>>> know of
>>>>>>> a compelling reason to not do that for the time being? The main thing
>>>>>>> that
>>>>>>> comes to mind as always is growing the Varnish cache object pool -
>>>>>>> probably
>>>>>>> not a huge deal while the thing is only in beta, but on the stable
>>>>>>> channel
>>>>>>> maybe noteworthy because it would run on probably most pages (but
>>>>>>> that's
>>>>>>> what edge caches are for, after all).
>>>>>>>
>>>>>>> Erik, from your perspective does use of smaxage relieve the backend
>>>>>>> sufficiently?
>>>>>>>
>>>>>>> If we do smaxage, then Web, Android, iOS should standardize their
>>>>>>> URLs so we
>>>>>>> get more cache hits at the edge across all clients. Here's the URL I
>>>>>>> see
>>>>>>> being used on the web today from mobile web beta:
>>>>>>>
>>>>>>>
>>>>>>> https://en.m.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages%7Cpageterms&piprop=thumbnail&pithumbsize=80&wbptterms=description&pilimit=3&generator=search&gsrsearch=morelike%3ACome_Share_My_Love&gsrnamespace=0&gsrlimit=3
>>>>>>>
>>>>>>>
>>>>>>> -Adam
>>>>>>>
>>>>>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez
>>>>>>> <jhernandez@wikimedia.org> wrote:
>>>>>>>>
>>>>>>>> I'd be up to it if we manage to cram it up in a following sprint and
>>>>>>>> it is
>>>>>>>> worth it.
>>>>>>>>
>>>>>>>> We could run a controlled test against production with a long batch
>>>>>>>> of
>>>>>>>> articles and check median/percentiles response time with repeated
>>>>>>>> runs and
>>>>>>>> highlight the different results for human inspection regarding
>>>>>>>> quality.
>>>>>>>>
>>>>>>>> It's been noted previously that the results are far from ideal
>>>>>>>> (which they
>>>>>>>> are because it is just morelike), and I think it would be a great
>>>>>>>> idea to
>>>>>>>> change the endpoint to a specific one that is smarter and has some
>>>>>>>> cache (we
>>>>>>>> could do much more to get relevant results besides text similarity,
>>>>>>>> take
>>>>>>>> into account links, or see also links if there are, etc...).
>>>>>>>>
>>>>>>>> As a note, in mobile web the related articles extension allows
>>>>>>>> editors to
>>>>>>>> specify articles to show in the section, which would avoid queries
>>>>>>>> to
>>>>>>>> cirrussearch if it was more used (once rolled into stable I guess).
>>>>>>>>
>>>>>>>> I remember that the performance related task was closed as resolved
>>>>>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we
>>>>>>>> reopen it or
>>>>>>>> create a new one?
>>>>>>>>
>>>>>>>> I'm not sure if we ended up adding the smaxage parameter (I think we
>>>>>>>> didn't), should we? To me it seems a no-brainer that we should be
>>>>>>>> caching
>>>>>>>> this results in varnish since they don't need to be completely up to
>>>>>>>> date
>>>>>>>> for this use case.
>>>>>>>>
>>>>>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson
>>>>>>>> <ebernhardson@wikimedia.org> wrote:
>>>>>>>>>
>>>>>>>>> Both mobile apps and web are using CirrusSearch's morelike: feature
>>>>>>>>> which
>>>>>>>>> is showing some performance issues on our end. We would like to
>>>>>>>>> make a
>>>>>>>>> performance optimization to it, but before we would prefer to run
>>>>>>>>> an A/B
>>>>>>>>> test to see if the results are still "about as good" as they are
>>>>>>>>> currently.
>>>>>>>>>
>>>>>>>>> The optimization is basically: Currently more like this takes the
>>>>>>>>> entire
>>>>>>>>> article into account, we would like to change this to take only the
>>>>>>>>> opening
>>>>>>>>> text of an article into account. This should reduce the amount of
>>>>>>>>> work we
>>>>>>>>> have to do on the backend saving both server load and latency the
>>>>>>>>> user sees
>>>>>>>>> running the query.
>>>>>>>>>
>>>>>>>>> This can be triggered by adding these two query parameters to the
>>>>>>>>> search
>>>>>>>>> api request that is being performed:
>>>>>>>>>
>>>>>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The API will give a warning that these parameters do not exist, but
>>>>>>>>> they
>>>>>>>>> are safe to ignore. Would any of you be willing to run this test?
>>>>>>>>> We would
>>>>>>>>> basically want to look at user perceived latency along with click
>>>>>>>>> through
>>>>>>>>> rates for the current default setup along with the restricted setup
>>>>>>>>> using
>>>>>>>>> only opening_text.
>>>>>>>>>
>>>>>>>>> Erik B.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Mobile-l mailing list
>>>>>>>>> Mobile-l@lists.wikimedia.org
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mobile-l mailing list
>>>>>>> Mobile-l@lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Mobile-l mailing list
>>>>>> Mobile-l@lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mobile-l mailing list
>>>>> Mobile-l@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mobile-l mailing list
>>>> Mobile-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>
>>>
>
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>



--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l



--
Dmitry Brant
Mobile Apps Team (Android)
Wikimedia Foundation
https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering




--
Dmitry Brant
Mobile Apps Team (Android)
Wikimedia Foundation
https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering