Roger that! I think we could squeeze it in -- the
change would be pretty
straightforward. We'll be able to release a Beta with this A/B test in
short order, but it will probably be a couple weeks until our next
production release. I hope that's all right.
On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke <gwicke(a)wikimedia.org>
wrote:
We are also happy to add cached entry points for
high-traffic end
points in the REST API. I commented to that effect at
https://phabricator.wikimedia.org/T124216#1984206. Let us know if you
think this would be useful for this use case.
On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso <abaso(a)wikimedia.org> wrote:
> Okay. As per
https://phabricator.wikimedia.org/T124225#1984080 I
think if
> we're doing near term experimentation with a controlled A/B test the
Android
> app is the only logical place to start. Dmitry, can that work for
you? It's
> not required, but I think it would be neat to see if we can move the
needle
> even more. Of course your quarterly goals take top priority...but
what do
> you think?
>
> On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso <abaso(a)wikimedia.org>
wrote:
>>
>> Hey all, am planning to look at Phabricator tasks and provide a reply
>> during the upcoming weekdays. Just wanted to acknowledge I saw your
replies!
>>
>>
>> On Friday, January 22, 2016, Erik Bernhardson <
ebernhardson(a)wikimedia.org>
>> wrote:
>>>
>>> On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez
>>> <jhernandez(a)wikimedia.org> wrote:
>>>>
>>>> Regarding the caching, we would need to agree between apps and web
about
>>>> the url and smaxage parameter as Adam noted so that the urls are
exactly the
>>>> same to not bloat varnish and reuse the same cached objects across
>>>> platforms.
>>>>
>>>> It is an extremely adhoc and brittle solution but seems like it
would be
>>>> the greatest win.
>>>>
>>>> 20% of the traffic from searches by being only in android and web
beta
>>>> seems a lot to me, and we should work on reducing it, otherwise
when it hits
>>>> web stable we're going to crush the servers, so caching seems the
highest
>>>> priority.
>>>>
>>> To clarify its 20% of the load, as opposed to 20% of the traffic.
But
>>> same difference :)
>>>
>>>>
>>>> Let's chime in
https://phabricator.wikimedia.org/T124216 and
continue
>>>> the cache discussion there.
>>>>
>>>> Regarding the validity of results with opening text only, how
should we
>>>> proceed? Adam?
>>>>
>>> I've put together
https://phabricator.wikimedia.org/T124258 to
track
>>> putting together an AB test that measures the difference in click
through
>>> rates for the two approaches.
>>>
>>>
>>>>
>>>> On Wed, Jan 20, 2016 at 9:34 PM, David Causse <
dcausse(a)wikimedia.org>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Yes we can combine many factors, from templates (quality but also
>>>>> disambiguation/stubs), size and others.
>>>>> Today cirrus uses mostly the number of incoming links which
(imho) is
>>>>> not very good for morelike.
>>>>> On enwiki results will also be scored according the weights
defined in
>>>>>
https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
>>>>>
>>>>> I wrote a small bash to compare results :
>>>>>
https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad
>>>>> Here is some random results from the list (Semetimes better,
sometimes
>>>>> worse) :
>>>>>
>>>>> $ sh morelike.sh Revolution_Muslim
>>>>> Defaults
>>>>> "title": "Chess",
>>>>> "title": "Suicide attack",
>>>>> "title": "Zachary Adam Chesser",
>>>>> =======
>>>>> Opening text no boost links
>>>>> "title": "Hungarian Revolution of
1956",
>>>>> "title": "Muslims for America",
>>>>> "title": "Salafist Front",
>>>>>
>>>>> $ sh morelike.sh Chesser
>>>>> Defaults
>>>>> "title": "Chess",
>>>>> "title": "Edinburgh",
>>>>> "title": "Edinburgh Corn Exchange",
>>>>> =======
>>>>> Opening text no boost links
>>>>> "title": "Dreghorn Barracks",
>>>>> "title": "Edinburgh Chess Club",
>>>>> "title": "Threipmuir Reservoir",
>>>>>
>>>>> $ sh morelike.sh Time_%28disambiguation%29
>>>>> Defaults
>>>>> "title": "Atlantis: The Lost Empire",
>>>>> "title": "Stargate",
>>>>> "title": "Stargate SG-1",
>>>>> =======
>>>>> Opening text no boost links
>>>>> "title": "Father Time
(disambiguation)",
>>>>> "title": "The Last Time",
>>>>> "title": "Time After Time",
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 20/01/2016 19:34, Jon Robson a écrit :
>>>>>>
>>>>>> I'm actually interested to see whether this yields better
results in
>>>>>> certain examples where the algorithm is lacking [1]. If
it's
done as
>>>>>> an A/B test we could even measure things such as click throughs
in the
>>>>>> related article feature (whether they go up or not)
>>>>>>
>>>>>> Out of interest is it also possible to take article size and
type into
>>>>>> account and not returning any morelike results for things like
>>>>>> disambiguation pages and stubs?
>>>>>>
>>>>>> [1]
https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso
<abaso(a)wikimedia.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> One thing we could do regarding the quality of the output
is
check
>>>>>>> results
>>>>>>> against a random sample of popular articles (example
approach
to find
>>>>>>> some
>>>>>>> articles) on mdot Wikipedia. Presuming that improves the
quality of
>>>>>>> the
>>>>>>> recommendations or at least does not degrade them, we
should
consider
>>>>>>> adding
>>>>>>> the enhancement task to a future sprint, with further
instrumentation
>>>>>>> and
>>>>>>> A/B testing / timeboxed beta test, etc.
>>>>>>>
>>>>>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem
a
good
>>>>>>> fix for
>>>>>>> now for further reduction of client perceived wait, at least
for
>>>>>>> non-cold
>>>>>>> cache requests, even if we stop beating up the backend.
Does
anyone
>>>>>>> know of
>>>>>>> a compelling reason to not do that for the time being? The
main
thing
>>>>>>> that
>>>>>>> comes to mind as always is growing the Varnish cache object
pool -
>>>>>>> probably
>>>>>>> not a huge deal while the thing is only in beta, but on the
stable
>>>>>>> channel
>>>>>>> maybe noteworthy because it would run on probably most
pages
(but
>>>>>>> that's
>>>>>>> what edge caches are for, after all).
>>>>>>>
>>>>>>> Erik, from your perspective does use of smaxage relieve the
backend
>>>>>>> sufficiently?
>>>>>>>
>>>>>>> If we do smaxage, then Web, Android, iOS should standardize
their
>>>>>>> URLs so we
>>>>>>> get more cache hits at the edge across all clients.
Here's the
URL I
>>>>>>> see
>>>>>>> being used on the web today from mobile web beta:
>>>>>>>
>>>>>>>
>>>>>>>
https://en.m.wikipedia.org/w/api.php?action=query&format=json&forma…
>>>>>>>
>>>>>>>
>>>>>>> -Adam
>>>>>>>
>>>>>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez
>>>>>>> <jhernandez(a)wikimedia.org> wrote:
>>>>>>>>
>>>>>>>> I'd be up to it if we manage to cram it up in a
following
sprint and
>>>>>>>> it is
>>>>>>>> worth it.
>>>>>>>>
>>>>>>>> We could run a controlled test against production with a
long
batch
>>>>>>>> of
>>>>>>>> articles and check median/percentiles response time
with
repeated
>>>>>>>> runs and
>>>>>>>> highlight the different results for human inspection
regarding
>>>>>>>> quality.
>>>>>>>>
>>>>>>>> It's been noted previously that the results are far
from ideal
>>>>>>>> (which they
>>>>>>>> are because it is just morelike), and I think it would
be a
great
>>>>>>>> idea to
>>>>>>>> change the endpoint to a specific one that is smarter
and has
some
>>>>>>>> cache (we
>>>>>>>> could do much more to get relevant results besides text
similarity,
>>>>>>>> take
>>>>>>>> into account links, or see also links if there are,
etc...).
>>>>>>>>
>>>>>>>> As a note, in mobile web the related articles extension
allows
>>>>>>>> editors to
>>>>>>>> specify articles to show in the section, which would
avoid
queries
>>>>>>>> to
>>>>>>>> cirrussearch if it was more used (once rolled into
stable I
guess).
>>>>>>>>
>>>>>>>> I remember that the performance related task was closed
as
resolved
>>>>>>>> (
https://phabricator.wikimedia.org/T121254#1907192),
should we
>>>>>>>> reopen it or
>>>>>>>> create a new one?
>>>>>>>>
>>>>>>>> I'm not sure if we ended up adding the smaxage
parameter (I
think we
>>>>>>>> didn't), should we? To me it seems a no-brainer that
we should
be
>>>>>>>> caching
>>>>>>>> this results in varnish since they don't need to be
completely
up to
>>>>>>>> date
>>>>>>>> for this use case.
>>>>>>>>
>>>>>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson
>>>>>>>> <ebernhardson(a)wikimedia.org> wrote:
>>>>>>>>>
>>>>>>>>> Both mobile apps and web are using
CirrusSearch's morelike:
feature
>>>>>>>>> which
>>>>>>>>> is showing some performance issues on our end. We
would like
to
>>>>>>>>> make a
>>>>>>>>> performance optimization to it, but before we would
prefer to
run
>>>>>>>>> an A/B
>>>>>>>>> test to see if the results are still "about as
good" as they
are
>>>>>>>>> currently.
>>>>>>>>>
>>>>>>>>> The optimization is basically: Currently more like
this takes
the
>>>>>>>>> entire
>>>>>>>>> article into account, we would like to change this
to take
only the
>>>>>>>>> opening
>>>>>>>>> text of an article into account. This should reduce
the
amount of
>>>>>>>>> work we
>>>>>>>>> have to do on the backend saving both server load
and latency
the
>>>>>>>>> user sees
>>>>>>>>> running the query.
>>>>>>>>>
>>>>>>>>> This can be triggered by adding these two query
parameters to
the
>>>>>>>>> search
>>>>>>>>> api request that is being performed:
>>>>>>>>>
>>>>>>>>>
cirrusMltUseFields=yes&cirrusMltFields=opening_text
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The API will give a warning that these parameters do
not
exist, but
>>>>>>>>> they
>>>>>>>>> are safe to ignore. Would any of you be willing to
run this
test?
>>>>>>>>> We would
>>>>>>>>> basically want to look at user perceived latency
along with
click
>>>>>>>>> through
>>>>>>>>> rates for the current default setup along with the
restricted
setup
>>>>>>>>> using
>>>>>>>>> only opening_text.
>>>>>>>>>
>>>>>>>>> Erik B.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Mobile-l mailing list
>>>>>>>>> Mobile-l(a)lists.wikimedia.org
>>>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mobile-l mailing list
>>>>>>> Mobile-l(a)lists.wikimedia.org
>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Mobile-l mailing list
>>>>>> Mobile-l(a)lists.wikimedia.org
>>>>>>
https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mobile-l mailing list
>>>>> Mobile-l(a)lists.wikimedia.org
>>>>>
https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mobile-l mailing list
>>>> Mobile-l(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>>>
>>>
>
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
_______________________________________________
Mobile-l mailing list
Mobile-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l