On Wed, May 27, 2015 at 12:33 PM, Adam Baso <abaso@wikimedia.org> wrote:

Thought I'd step in here. People who know the mechanics of the relevant logging are as follows:

Android: Dmitry Brant
Desktop: Bahodir (Baha) Mansurov
Mobile Web: Sam Smith & Baha

CC'ing them.

As I understand, Dan Garry's been talking with Baha already on the desktop piece.

It looks like on Chrome/42 UAs the clickthrough rate for a given suggest search form interaction is about 40%. [1]

A couple of patches are pending (JS for emitting event on new EL schema) that will make it possible to figure out how often form submission within a form interaction (ENTER/RETURN, tapping the magnifying class) occurs as well.

Total form interactions (keys on userSessionToken...maybe a better name could be used) minus clickthroughs (click-result) minus form submission (submit-form in the pending patches on the new EL schema) would be a rough proxy of abandonment, I think.

I had heard sendBeacon capable UAs were likely to have greater success emitting the click-result (i.e., when user clicks on a suggestion from the form on the search panel on desktop) event via mw.track, so it may make sense to confine queries for such analysis on desktop to known sendBeacon browsers [2] to increase the odds of high fidelity data just in case there are outlier browsers that manage to somehow emit click-result events through means other than sendBeacon (it seems there may be some of these, assuming non-forged UAs).

-Adam

[1]

> SELECT count(*) FROM Search_11670541 WHERE timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------+
| count(*) |
+----------+
| 112 |
+----------+
1 row in set (2.96 sec)

> SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------------------------------------+
| count(DISTINCT event_userSessionToken) |
+----------------------------------------+
| 112 |
+----------------------------------------+
1 row in set (38.06 sec)

> SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE timestamp > '20150526' AND timestamp < '20150527' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------------------------------------+
| count(DISTINCT event_userSessionToken) |
+----------------------------------------+
| 286 |
+----------------------------------------+
1 row in set (7.26 sec)

[2] https://developer.mozilla.org/it/docs/Web/API/Navigator/sendBeacon

On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Thanks Tomasz; great feedback! In order:

* yeah, top percentiles were a heavily-requested thing so I built it
in from the get-go. Similarly, mean/median so we have some ability to
avoid distorting the results when the distribution changes.
* The 3 days data thing is a known -
https://phabricator.wikimedia.org/T100056 - and is next on my to-do
list for bugfixes :).
* Glad you like the interface! It's actually functional on mobile, too :D.
* Sample rate is crucial, yep. I'm reaching out to the authors of the
relevant EL schemas to find out how each was handled.
* Sessions < results opened makes sense in the event that users didn't
find what they wanted and went back to try again, but I'm not sure how
"session" is calculated; this is again something we lack transparency
around :(. Dan? You're the apps wizard.

In supporting this: probably nothing at the moment although Nik/Kevin
chipping in on the relevant phabricator ticket
(https://phabricator.wikimedia.org/T99762 ) to validate how much of a
PITA the idea of a unified schema and the associated implementations
are, would be good.

I'm sort of shocked to hear "we're supposed to be presenting this data
at the next metrics meeting": in the future if there are instances
where data is going to be up for public scrutiny, would it be possible
to explicitly associate time for that? My goal is to get us to the
point where our data is reliable all, or at least, most of the time,
and for a fragment of one person's time over two weeks, I think
progress on that is pretty fantastic. But prepping data for that kind
of event does change the priorities and what tasks should be worked
on.

If we want to present data, generally speaking, let's discuss what we
can show off. If we want to present the dashboards I'll put my all
into making the data at least something where we know the
deficiencies, if not something where we consider the deficiencies
tolerable.

On 26 May 2015 at 19:24, Tomasz Finc <tfinc@wikimedia.org> wrote:
> Thanks Oliver
>
> Early observations
>
> * Really happy to see top percentiles in load graphs
> * Mobile Web has only three days data
> * Interface is simple and easy to use
> * We need to know the sample rate
> * Apps have fewer sessions than results page opened
>
> Speaking over IRC it's clear that we don't have confidence in this
> data. We need to fix this and fix it quickly so that we can accurately
> plan our work. We're supposed to be presenting this data at the next
> metrics meeting and we're not a point where I feel comfortable sharing
> our data let alone next steps.
>
> Oliver & Dan, what can the team do to support you guys on this? I want
> you guys to own this and know that were here to support you.
>
> Should I be adding new feature requests and bugs to
> https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
>
> --tomasz
>
> On Tue, May 26, 2015 at 11:04 AM, James Douglas <jdouglas@wikimedia.org> wrote:
>> This is a very exciting preview of things to come.
>>
>> Where are the data coming from? Am I just confused, or does "6 search
>> sessions per day" seem low?
>>
>> On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>>
>>> http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously
>>> we were playing around with them and testing what we needed with a
>>> static snapshot; these dashboards will now update once a day with new
>>> information.
>>>
>>> It has turned up some bugs ("is the mobile schema just not running?")
>>> and there are more metrics to add. But for the time being, is progress
>>> :)
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Wikimedia-search-private mailing list
>>> Wikimedia-search-private@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
>>
>>
>>
>> _______________________________________________
>> Wikimedia-search-private mailing list
>> Wikimedia-search-private@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
>>
>
> _______________________________________________
> Wikimedia-search-private mailing list
> Wikimedia-search-private@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private

--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics