http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
Awesome. -m.
On Fri, May 22, 2015 at 5:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
68,000 searches/day seems *really* low, even by my pretty low expectations - I would have guessed something like 1% of visitors, which (with 200M page views a day) means I'm off by an order of magnitude, more or less. Am I just that far off or is the data still a WIP, or some combination of the two?
Luis
On Fri, May 22, 2015 at 2:56 PM, Michael Holloway mholloway@wikimedia.org wrote:
Awesome. -m.
On Fri, May 22, 2015 at 5:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Comparing it to http://stats.grok.se/en/latest30/Special:Search it do seem low indeed.
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-05-23 0:14 GMT+02:00 Luis Villa lvilla@wikimedia.org:
68,000 searches/day seems *really* low, even by my pretty low expectations
- I would have guessed something like 1% of visitors, which (with 200M page
views a day) means I'm off by an order of magnitude, more or less. Am I just that far off or is the data still a WIP, or some combination of the two?
Luis
On Fri, May 22, 2015 at 2:56 PM, Michael Holloway <mholloway@wikimedia.org
wrote:
Awesome. -m.
On Fri, May 22, 2015 at 5:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Luis Villa Sr. Director of Community Engagement Wikimedia Foundation *Working towards a world in which every single human being can freely share in the sum of all knowledge.*
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, May 22, 2015 at 3:14 PM, Luis Villa lvilla@wikimedia.org wrote:
68,000 searches/day seems *really* low,
right, but I'm not sure search sessions per day is the same as the number of searches per day. Oliver, what definition of a "search session" do you use? How do you compute it?
Leila
Luis
On Fri, May 22, 2015 at 2:56 PM, Michael Holloway <mholloway@wikimedia.org
wrote:
Awesome. -m.
On Fri, May 22, 2015 at 5:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Luis Villa Sr. Director of Community Engagement Wikimedia Foundation *Working towards a world in which every single human being can freely share in the sum of all knowledge.*
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Lotsa questions!
1. The number is EventLogging sourced and thus sampled (I'll update the documentation to make this clear) 2. The search session generation is computed by the EventLogging schemas which I played no role in writing and which are currently run by, ahh...3 different teams. We're going to dig into the methodology and unify them now that we've got the framework for representing the results built.
On 22 May 2015 at 21:02, Leila Zia leila@wikimedia.org wrote:
On Fri, May 22, 2015 at 3:14 PM, Luis Villa lvilla@wikimedia.org wrote:
68,000 searches/day seems *really* low,
right, but I'm not sure search sessions per day is the same as the number of searches per day. Oliver, what definition of a "search session" do you use? How do you compute it?
Leila
Luis
On Fri, May 22, 2015 at 2:56 PM, Michael Holloway mholloway@wikimedia.org wrote:
Awesome. -m.
On Fri, May 22, 2015 at 5:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Luis Villa Sr. Director of Community Engagement Wikimedia Foundation Working towards a world in which every single human being can freely share in the sum of all knowledge.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Absolutely low; there be a bug! :D I love bugs - when they crop up they turn into better code. Fixing as we speak :)
On 26 May 2015 at 14:04, James Douglas jdouglas@wikimedia.org wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Should be fixed!
On 26 May 2015 at 14:26, Oliver Keyes okeyes@wikimedia.org wrote:
Absolutely low; there be a bug! :D I love bugs - when they crop up they turn into better code. Fixing as we speak :)
On 26 May 2015 at 14:04, James Douglas jdouglas@wikimedia.org wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
-- Oliver Keyes Research Analyst Wikimedia Foundation
Thanks Oliver
Early observations
* Really happy to see top percentiles in load graphs * Mobile Web has only three days data * Interface is simple and easy to use * We need to know the sample rate * Apps have fewer sessions than results page opened
Speaking over IRC it's clear that we don't have confidence in this data. We need to fix this and fix it quickly so that we can accurately plan our work. We're supposed to be presenting this data at the next metrics meeting and we're not a point where I feel comfortable sharing our data let alone next steps.
Oliver & Dan, what can the team do to support you guys on this? I want you guys to own this and know that were here to support you.
Should I be adding new feature requests and bugs to https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
--tomasz
On Tue, May 26, 2015 at 11:04 AM, James Douglas jdouglas@wikimedia.org wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Thanks Tomasz; great feedback! In order:
* yeah, top percentiles were a heavily-requested thing so I built it in from the get-go. Similarly, mean/median so we have some ability to avoid distorting the results when the distribution changes. * The 3 days data thing is a known - https://phabricator.wikimedia.org/T100056 - and is next on my to-do list for bugfixes :). * Glad you like the interface! It's actually functional on mobile, too :D. * Sample rate is crucial, yep. I'm reaching out to the authors of the relevant EL schemas to find out how each was handled. * Sessions < results opened makes sense in the event that users didn't find what they wanted and went back to try again, but I'm not sure how "session" is calculated; this is again something we lack transparency around :(. Dan? You're the apps wizard.
In supporting this: probably nothing at the moment although Nik/Kevin chipping in on the relevant phabricator ticket (https://phabricator.wikimedia.org/T99762 ) to validate how much of a PITA the idea of a unified schema and the associated implementations are, would be good.
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
On 26 May 2015 at 19:24, Tomasz Finc tfinc@wikimedia.org wrote:
Thanks Oliver
Early observations
- Really happy to see top percentiles in load graphs
- Mobile Web has only three days data
- Interface is simple and easy to use
- We need to know the sample rate
- Apps have fewer sessions than results page opened
Speaking over IRC it's clear that we don't have confidence in this data. We need to fix this and fix it quickly so that we can accurately plan our work. We're supposed to be presenting this data at the next metrics meeting and we're not a point where I feel comfortable sharing our data let alone next steps.
Oliver & Dan, what can the team do to support you guys on this? I want you guys to own this and know that were here to support you.
Should I be adding new feature requests and bugs to https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
--tomasz
On Tue, May 26, 2015 at 11:04 AM, James Douglas jdouglas@wikimedia.org wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Thought I'd step in here. People who know the mechanics of the relevant logging are as follows:
Android: Dmitry Brant Desktop: Bahodir (Baha) Mansurov Mobile Web: Sam Smith & Baha
CC'ing them.
As I understand, Dan Garry's been talking with Baha already on the desktop piece.
It looks like on Chrome/42 UAs the clickthrough rate for a given suggest search form interaction is about 40%. [1]
A couple of patches are pending (JS for emitting event on new EL schema) that will make it possible to figure out how often form submission within a form interaction (ENTER/RETURN, tapping the magnifying class) occurs as well.
Total form interactions (keys on userSessionToken...maybe a better name could be used) minus clickthroughs (click-result) minus form submission (submit-form in the pending patches on the new EL schema) would be a rough proxy of abandonment, I think.
I had heard sendBeacon capable UAs were likely to have greater success emitting the click-result (i.e., when user clicks on a suggestion from the form on the search panel on desktop) event via mw.track, so it may make sense to confine queries for such analysis on desktop to known sendBeacon browsers [2] to increase the odds of high fidelity data just in case there are outlier browsers that manage to somehow emit click-result events through means other than sendBeacon (it seems there may be some of these, assuming non-forged UAs).
-Adam
[1]
SELECT count(*) FROM Search_11670541 WHERE timestamp >= '20150526' AND
timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%'; +----------+ | count(*) | +----------+ | 112 | +----------+ 1 row in set (2.96 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE
timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%'; +----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 112 | +----------------------------------------+ 1 row in set (38.06 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE
timestamp > '20150526' AND timestamp < '20150527' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%'; +----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 286 | +----------------------------------------+ 1 row in set (7.26 sec)
[2] https://developer.mozilla.org/it/docs/Web/API/Navigator/sendBeacon
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Thanks Tomasz; great feedback! In order:
- yeah, top percentiles were a heavily-requested thing so I built it
in from the get-go. Similarly, mean/median so we have some ability to avoid distorting the results when the distribution changes.
- The 3 days data thing is a known -
https://phabricator.wikimedia.org/T100056 - and is next on my to-do list for bugfixes :).
- Glad you like the interface! It's actually functional on mobile, too :D.
- Sample rate is crucial, yep. I'm reaching out to the authors of the
relevant EL schemas to find out how each was handled.
- Sessions < results opened makes sense in the event that users didn't
find what they wanted and went back to try again, but I'm not sure how "session" is calculated; this is again something we lack transparency around :(. Dan? You're the apps wizard.
In supporting this: probably nothing at the moment although Nik/Kevin chipping in on the relevant phabricator ticket (https://phabricator.wikimedia.org/T99762 ) to validate how much of a PITA the idea of a unified schema and the associated implementations are, would be good.
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
On 26 May 2015 at 19:24, Tomasz Finc tfinc@wikimedia.org wrote:
Thanks Oliver
Early observations
- Really happy to see top percentiles in load graphs
- Mobile Web has only three days data
- Interface is simple and easy to use
- We need to know the sample rate
- Apps have fewer sessions than results page opened
Speaking over IRC it's clear that we don't have confidence in this data. We need to fix this and fix it quickly so that we can accurately plan our work. We're supposed to be presenting this data at the next metrics meeting and we're not a point where I feel comfortable sharing our data let alone next steps.
Oliver & Dan, what can the team do to support you guys on this? I want you guys to own this and know that were here to support you.
Should I be adding new feature requests and bugs to https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
--tomasz
On Tue, May 26, 2015 at 11:04 AM, James Douglas jdouglas@wikimedia.org
wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org
wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
+Corey Floyd, who recently added code for this stuff on iOS (slated for forthcoming 4.1.4 iOS Wikipedia app release).
On Wed, May 27, 2015 at 12:33 PM, Adam Baso abaso@wikimedia.org wrote:
Thought I'd step in here. People who know the mechanics of the relevant logging are as follows:
Android: Dmitry Brant Desktop: Bahodir (Baha) Mansurov Mobile Web: Sam Smith & Baha
CC'ing them.
As I understand, Dan Garry's been talking with Baha already on the desktop piece.
It looks like on Chrome/42 UAs the clickthrough rate for a given suggest search form interaction is about 40%. [1]
A couple of patches are pending (JS for emitting event on new EL schema) that will make it possible to figure out how often form submission within a form interaction (ENTER/RETURN, tapping the magnifying class) occurs as well.
Total form interactions (keys on userSessionToken...maybe a better name could be used) minus clickthroughs (click-result) minus form submission (submit-form in the pending patches on the new EL schema) would be a rough proxy of abandonment, I think.
I had heard sendBeacon capable UAs were likely to have greater success emitting the click-result (i.e., when user clicks on a suggestion from the form on the search panel on desktop) event via mw.track, so it may make sense to confine queries for such analysis on desktop to known sendBeacon browsers [2] to increase the odds of high fidelity data just in case there are outlier browsers that manage to somehow emit click-result events through means other than sendBeacon (it seems there may be some of these, assuming non-forged UAs).
-Adam
[1]
SELECT count(*) FROM Search_11670541 WHERE timestamp >= '20150526' AND
timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%'; +----------+ | count(*) | +----------+ | 112 | +----------+ 1 row in set (2.96 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE
timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%'; +----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 112 | +----------------------------------------+ 1 row in set (38.06 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE
timestamp > '20150526' AND timestamp < '20150527' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%'; +----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 286 | +----------------------------------------+ 1 row in set (7.26 sec)
[2] https://developer.mozilla.org/it/docs/Web/API/Navigator/sendBeacon
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Thanks Tomasz; great feedback! In order:
- yeah, top percentiles were a heavily-requested thing so I built it
in from the get-go. Similarly, mean/median so we have some ability to avoid distorting the results when the distribution changes.
- The 3 days data thing is a known -
https://phabricator.wikimedia.org/T100056 - and is next on my to-do list for bugfixes :).
- Glad you like the interface! It's actually functional on mobile, too :D.
- Sample rate is crucial, yep. I'm reaching out to the authors of the
relevant EL schemas to find out how each was handled.
- Sessions < results opened makes sense in the event that users didn't
find what they wanted and went back to try again, but I'm not sure how "session" is calculated; this is again something we lack transparency around :(. Dan? You're the apps wizard.
In supporting this: probably nothing at the moment although Nik/Kevin chipping in on the relevant phabricator ticket (https://phabricator.wikimedia.org/T99762 ) to validate how much of a PITA the idea of a unified schema and the associated implementations are, would be good.
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
On 26 May 2015 at 19:24, Tomasz Finc tfinc@wikimedia.org wrote:
Thanks Oliver
Early observations
- Really happy to see top percentiles in load graphs
- Mobile Web has only three days data
- Interface is simple and easy to use
- We need to know the sample rate
- Apps have fewer sessions than results page opened
Speaking over IRC it's clear that we don't have confidence in this data. We need to fix this and fix it quickly so that we can accurately plan our work. We're supposed to be presenting this data at the next metrics meeting and we're not a point where I feel comfortable sharing our data let alone next steps.
Oliver & Dan, what can the team do to support you guys on this? I want you guys to own this and know that were here to support you.
Should I be adding new feature requests and bugs to https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
--tomasz
On Tue, May 26, 2015 at 11:04 AM, James Douglas jdouglas@wikimedia.org
wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org
wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
It wasn't previously on iOS, so we've just been tracking Android? :/
On 28 May 2015 at 12:34, Adam Baso abaso@wikimedia.org wrote:
+Corey Floyd, who recently added code for this stuff on iOS (slated for forthcoming 4.1.4 iOS Wikipedia app release).
On Wed, May 27, 2015 at 12:33 PM, Adam Baso abaso@wikimedia.org wrote:
Thought I'd step in here. People who know the mechanics of the relevant logging are as follows:
Android: Dmitry Brant Desktop: Bahodir (Baha) Mansurov Mobile Web: Sam Smith & Baha
CC'ing them.
As I understand, Dan Garry's been talking with Baha already on the desktop piece.
It looks like on Chrome/42 UAs the clickthrough rate for a given suggest search form interaction is about 40%. [1]
A couple of patches are pending (JS for emitting event on new EL schema) that will make it possible to figure out how often form submission within a form interaction (ENTER/RETURN, tapping the magnifying class) occurs as well.
Total form interactions (keys on userSessionToken...maybe a better name could be used) minus clickthroughs (click-result) minus form submission (submit-form in the pending patches on the new EL schema) would be a rough proxy of abandonment, I think.
I had heard sendBeacon capable UAs were likely to have greater success emitting the click-result (i.e., when user clicks on a suggestion from the form on the search panel on desktop) event via mw.track, so it may make sense to confine queries for such analysis on desktop to known sendBeacon browsers [2] to increase the odds of high fidelity data just in case there are outlier browsers that manage to somehow emit click-result events through means other than sendBeacon (it seems there may be some of these, assuming non-forged UAs).
-Adam
[1]
SELECT count(*) FROM Search_11670541 WHERE timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------+ | count(*) | +----------+ | 112 | +----------+ 1 row in set (2.96 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 112 | +----------------------------------------+ 1 row in set (38.06 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541 WHERE timestamp > '20150526' AND timestamp < '20150527' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 286 | +----------------------------------------+ 1 row in set (7.26 sec)
[2] https://developer.mozilla.org/it/docs/Web/API/Navigator/sendBeacon
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Thanks Tomasz; great feedback! In order:
- yeah, top percentiles were a heavily-requested thing so I built it
in from the get-go. Similarly, mean/median so we have some ability to avoid distorting the results when the distribution changes.
- The 3 days data thing is a known -
https://phabricator.wikimedia.org/T100056 - and is next on my to-do list for bugfixes :).
- Glad you like the interface! It's actually functional on mobile, too
:D.
- Sample rate is crucial, yep. I'm reaching out to the authors of the
relevant EL schemas to find out how each was handled.
- Sessions < results opened makes sense in the event that users didn't
find what they wanted and went back to try again, but I'm not sure how "session" is calculated; this is again something we lack transparency around :(. Dan? You're the apps wizard.
In supporting this: probably nothing at the moment although Nik/Kevin chipping in on the relevant phabricator ticket (https://phabricator.wikimedia.org/T99762 ) to validate how much of a PITA the idea of a unified schema and the associated implementations are, would be good.
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
On 26 May 2015 at 19:24, Tomasz Finc tfinc@wikimedia.org wrote:
Thanks Oliver
Early observations
- Really happy to see top percentiles in load graphs
- Mobile Web has only three days data
- Interface is simple and easy to use
- We need to know the sample rate
- Apps have fewer sessions than results page opened
Speaking over IRC it's clear that we don't have confidence in this data. We need to fix this and fix it quickly so that we can accurately plan our work. We're supposed to be presenting this data at the next metrics meeting and we're not a point where I feel comfortable sharing our data let alone next steps.
Oliver & Dan, what can the team do to support you guys on this? I want you guys to own this and know that were here to support you.
Should I be adding new feature requests and bugs to https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
--tomasz
On Tue, May 26, 2015 at 11:04 AM, James Douglas jdouglas@wikimedia.org wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6 search sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes okeyes@wikimedia.org wrote:
http://searchdata.wmflabs.org/ - boop! This was my Friday. Previously we were playing around with them and testing what we needed with a static snapshot; these dashboards will now update once a day with new information.
It has turned up some bugs ("is the mobile schema just not running?") and there are more metrics to add. But for the time being, is progress :)
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On the apps, yep.
On Thu, May 28, 2015 at 9:43 AM, Oliver Keyes okeyes@wikimedia.org wrote:
It wasn't previously on iOS, so we've just been tracking Android? :/
On 28 May 2015 at 12:34, Adam Baso abaso@wikimedia.org wrote:
+Corey Floyd, who recently added code for this stuff on iOS (slated for forthcoming 4.1.4 iOS Wikipedia app release).
On Wed, May 27, 2015 at 12:33 PM, Adam Baso abaso@wikimedia.org wrote:
Thought I'd step in here. People who know the mechanics of the relevant logging are as follows:
Android: Dmitry Brant Desktop: Bahodir (Baha) Mansurov Mobile Web: Sam Smith & Baha
CC'ing them.
As I understand, Dan Garry's been talking with Baha already on the
desktop
piece.
It looks like on Chrome/42 UAs the clickthrough rate for a given suggest search form interaction is about 40%. [1]
A couple of patches are pending (JS for emitting event on new EL schema) that will make it possible to figure out how often form submission
within a
form interaction (ENTER/RETURN, tapping the magnifying class) occurs as well.
Total form interactions (keys on userSessionToken...maybe a better name could be used) minus clickthroughs (click-result) minus form submission (submit-form in the pending patches on the new EL schema) would be a
rough
proxy of abandonment, I think.
I had heard sendBeacon capable UAs were likely to have greater success emitting the click-result (i.e., when user clicks on a suggestion from
the
form on the search panel on desktop) event via mw.track, so it may make sense to confine queries for such analysis on desktop to known
sendBeacon
browsers [2] to increase the odds of high fidelity data just in case
there
are outlier browsers that manage to somehow emit click-result events
through
means other than sendBeacon (it seems there may be some of these,
assuming
non-forged UAs).
-Adam
[1]
SELECT count(*) FROM Search_11670541 WHERE timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki =
'enwiki'
and userAgent LIKE '%Chrome/42%';
+----------+ | count(*) | +----------+ | 112 | +----------+ 1 row in set (2.96 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541
WHERE
timestamp >= '20150526' AND timestamp < '20150527' AND event_action = 'click-result' AND wiki = 'enwiki' and userAgent LIKE '%Chrome/42%';
+----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 112 | +----------------------------------------+ 1 row in set (38.06 sec)
SELECT count(DISTINCT event_userSessionToken) FROM Search_11670541
WHERE
timestamp > '20150526' AND timestamp < '20150527' AND wiki = 'enwiki'
and
userAgent LIKE '%Chrome/42%';
+----------------------------------------+ | count(DISTINCT event_userSessionToken) | +----------------------------------------+ | 286 | +----------------------------------------+ 1 row in set (7.26 sec)
[2] https://developer.mozilla.org/it/docs/Web/API/Navigator/sendBeacon
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Thanks Tomasz; great feedback! In order:
- yeah, top percentiles were a heavily-requested thing so I built it
in from the get-go. Similarly, mean/median so we have some ability to avoid distorting the results when the distribution changes.
- The 3 days data thing is a known -
https://phabricator.wikimedia.org/T100056 - and is next on my to-do list for bugfixes :).
- Glad you like the interface! It's actually functional on mobile, too
:D.
- Sample rate is crucial, yep. I'm reaching out to the authors of the
relevant EL schemas to find out how each was handled.
- Sessions < results opened makes sense in the event that users didn't
find what they wanted and went back to try again, but I'm not sure how "session" is calculated; this is again something we lack transparency around :(. Dan? You're the apps wizard.
In supporting this: probably nothing at the moment although Nik/Kevin chipping in on the relevant phabricator ticket (https://phabricator.wikimedia.org/T99762 ) to validate how much of a PITA the idea of a unified schema and the associated implementations are, would be good.
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
On 26 May 2015 at 19:24, Tomasz Finc tfinc@wikimedia.org wrote:
Thanks Oliver
Early observations
- Really happy to see top percentiles in load graphs
- Mobile Web has only three days data
- Interface is simple and easy to use
- We need to know the sample rate
- Apps have fewer sessions than results page opened
Speaking over IRC it's clear that we don't have confidence in this data. We need to fix this and fix it quickly so that we can
accurately
plan our work. We're supposed to be presenting this data at the next metrics meeting and we're not a point where I feel comfortable
sharing
our data let alone next steps.
Oliver & Dan, what can the team do to support you guys on this? I
want
you guys to own this and know that were here to support you.
Should I be adding new feature requests and bugs to https://phabricator.wikimedia.org/tag/search-data-analytics/ ?
--tomasz
On Tue, May 26, 2015 at 11:04 AM, James Douglas jdouglas@wikimedia.org wrote:
This is a very exciting preview of things to come.
Where are the data coming from? Am I just confused, or does "6
search
sessions per day" seem low?
On Fri, May 22, 2015 at 2:35 PM, Oliver Keyes <okeyes@wikimedia.org
wrote: > > http://searchdata.wmflabs.org/ - boop! This was my Friday.
Previously
> we were playing around with them and testing what we needed with a > static snapshot; these dashboards will now update once a day with
new
> information. > > It has turned up some bugs ("is the mobile schema just not
running?")
> and there are more metrics to add. But for the time being, is > progress > :) > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Wikimedia-search-private mailing list > Wikimedia-search-private@lists.wikimedia.org >
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
This was brought up as a next step during a number of discussions between the team before I left. Let's focus on what it would take and work with the team to do it. I don't want to present data we have no confidence in but we need to start showcasing the stories that were learning and pushing search forward for our users.
Thank you for your work on this Oliver
--tomasz
Indeed, but next step != explicit deliverables. Kevin, could you put together a meeting to work out what's specifically being asked for here? Then Dan can prioritise and schedule it as part of the standard process.
On 27 May 2015 at 17:04, Tomasz Finc tfinc@wikimedia.org wrote:
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
This was brought up as a next step during a number of discussions between the team before I left. Let's focus on what it would take and work with the team to do it. I don't want to present data we have no confidence in but we need to start showcasing the stories that were learning and pushing search forward for our users.
Thank you for your work on this Oliver
--tomasz
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
OK, who on the search team needs to be there?
On Wed, May 27, 2015 at 5:33 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Indeed, but next step != explicit deliverables. Kevin, could you put together a meeting to work out what's specifically being asked for here? Then Dan can prioritise and schedule it as part of the standard process.
On 27 May 2015 at 17:04, Tomasz Finc tfinc@wikimedia.org wrote:
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org
wrote:
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
This was brought up as a next step during a number of discussions between the team before I left. Let's focus on what it would take and work with the team to do it. I don't want to present data we have no confidence in but we need to start showcasing the stories that were learning and pushing search forward for our users.
Thank you for your work on this Oliver
--tomasz
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Different Kevin :P
On 27 May 2015 at 20:58, Kevin Leduc kevin@wikimedia.org wrote:
OK, who on the search team needs to be there?
On Wed, May 27, 2015 at 5:33 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Indeed, but next step != explicit deliverables. Kevin, could you put together a meeting to work out what's specifically being asked for here? Then Dan can prioritise and schedule it as part of the standard process.
On 27 May 2015 at 17:04, Tomasz Finc tfinc@wikimedia.org wrote:
On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
I'm sort of shocked to hear "we're supposed to be presenting this data at the next metrics meeting": in the future if there are instances where data is going to be up for public scrutiny, would it be possible to explicitly associate time for that? My goal is to get us to the point where our data is reliable all, or at least, most of the time, and for a fragment of one person's time over two weeks, I think progress on that is pretty fantastic. But prepping data for that kind of event does change the priorities and what tasks should be worked on.
If we want to present data, generally speaking, let's discuss what we can show off. If we want to present the dashboards I'll put my all into making the data at least something where we know the deficiencies, if not something where we consider the deficiencies tolerable.
This was brought up as a next step during a number of discussions between the team before I left. Let's focus on what it would take and work with the team to do it. I don't want to present data we have no confidence in but we need to start showcasing the stories that were learning and pushing search forward for our users.
Thank you for your work on this Oliver
--tomasz
Wikimedia-search-private mailing list Wikimedia-search-private@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics