While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.sear... I came to have serious doubts about this approach.
In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate.
Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment?
Can you think of a way of consistently identifying a user from page to page, but only in the trace following them landing on the search page, that does not include page parameters?
On 26 August 2015 at 16:30, Max Semenik maxsem.wiki@gmail.com wrote:
While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.sear... I came to have serious doubts about this approach.
In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate.
Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment?
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Nice catch Max. Thanks for reporting it. Do you have any suggestions for how we could alleviate this issue?
Thanks, Dan
On 26 August 2015 at 13:30, Max Semenik maxsem.wiki@gmail.com wrote:
While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.sear... I came to have serious doubts about this approach.
In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate.
Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment?
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
We couldn't come up with a solution when discussing this with Erik, hence this thread.
On Wed, Aug 26, 2015 at 2:11 PM, Dan Garry dgarry@wikimedia.org wrote:
Nice catch Max. Thanks for reporting it. Do you have any suggestions for how we could alleviate this issue?
Thanks, Dan
On 26 August 2015 at 13:30, Max Semenik maxsem.wiki@gmail.com wrote:
While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.sear... I came to have serious doubts about this approach.
In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate.
Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment?
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
We could make the new experience equally uncomfortable for both A and B, by uncaching the results in both cases. Not ideal, but at least would be apples to apples.
Kevin Smith Agile Coach, Wikimedia Foundation
On Wed, Aug 26, 2015 at 2:30 PM, Max Semenik maxsem.wiki@gmail.com wrote:
We couldn't come up with a solution when discussing this with Erik, hence this thread.
On Wed, Aug 26, 2015 at 2:11 PM, Dan Garry dgarry@wikimedia.org wrote:
Nice catch Max. Thanks for reporting it. Do you have any suggestions for how we could alleviate this issue?
Thanks, Dan
On 26 August 2015 at 13:30, Max Semenik maxsem.wiki@gmail.com wrote:
While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.sear... I came to have serious doubts about this approach.
In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate.
Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment?
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Aren't we uncached here anyway? Special pages and all.
-Chad On Aug 26, 2015 3:48 PM, "Kevin Smith" ksmith@wikimedia.org wrote:
We could make the new experience equally uncomfortable for both A and B, by uncaching the results in both cases. Not ideal, but at least would be apples to apples.
Kevin Smith Agile Coach, Wikimedia Foundation
On Wed, Aug 26, 2015 at 2:30 PM, Max Semenik maxsem.wiki@gmail.com wrote:
We couldn't come up with a solution when discussing this with Erik, hence this thread.
On Wed, Aug 26, 2015 at 2:11 PM, Dan Garry dgarry@wikimedia.org wrote:
Nice catch Max. Thanks for reporting it. Do you have any suggestions for how we could alleviate this issue?
Thanks, Dan
On 26 August 2015 at 13:30, Max Semenik maxsem.wiki@gmail.com wrote:
While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.sear... I came to have serious doubts about this approach.
In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate.
Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment?
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
On Wed, Aug 26, 2015 at 3:58 PM, Chad Horohoe chorohoe@wikimedia.org wrote:
Aren't we uncached here anyway? Special pages and all.
-Chad
Actually the events we are recording here measure the users interaction with the pages they found. The current idea is to add a query parameter to all search results (only for users in the test, from javascript) so that when they land on that page we know they came from search and start collecting interaction events.
It may not be the *appropriate* parameter name to use, but this sort of technique may help:
https://wikitech.wikimedia.org/wiki/Provenance
You'd want to look at the *current* VCL in templates/varnish in the operations repo to see how it's presently done for *wprov*.
-Adam
On Wed, Aug 26, 2015 at 4:00 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
On Wed, Aug 26, 2015 at 3:58 PM, Chad Horohoe chorohoe@wikimedia.org wrote:
Aren't we uncached here anyway? Special pages and all.
-Chad
Actually the events we are recording here measure the users interaction with the pages they found. The current idea is to add a query parameter to all search results (only for users in the test, from javascript) so that when they land on that page we know they came from search and start collecting interaction events.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Thats an excellent idea, thanks Adam. Talked about it w/ max and updated the patch, i think this will work out just fine.
Erik B.
On Wed, Aug 26, 2015 at 4:23 PM, Adam Baso abaso@wikimedia.org wrote:
It may not be the *appropriate* parameter name to use, but this sort of technique may help:
https://wikitech.wikimedia.org/wiki/Provenance
You'd want to look at the *current* VCL in templates/varnish in the operations repo to see how it's presently done for *wprov*.
-Adam
On Wed, Aug 26, 2015 at 4:00 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
On Wed, Aug 26, 2015 at 3:58 PM, Chad Horohoe chorohoe@wikimedia.org wrote:
Aren't we uncached here anyway? Special pages and all.
-Chad
Actually the events we are recording here measure the users interaction with the pages they found. The current idea is to add a query parameter to all search results (only for users in the test, from javascript) so that when they land on that page we know they came from search and start collecting interaction events.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Thanks so much Adam! :). And thank you to Max for surfacing the issue - this kind of consistent and practical attention to our end goal, which is not KPIs about user satisfaction but actual user satisfaction, is precisely what we need, and I am glad to see it coming out.
On 26 August 2015 at 19:42, Erik Bernhardson ebernhardson@wikimedia.org wrote:
Thats an excellent idea, thanks Adam. Talked about it w/ max and updated the patch, i think this will work out just fine.
Erik B.
On Wed, Aug 26, 2015 at 4:23 PM, Adam Baso abaso@wikimedia.org wrote:
It may not be the appropriate parameter name to use, but this sort of technique may help:
https://wikitech.wikimedia.org/wiki/Provenance
You'd want to look at the current VCL in templates/varnish in the operations repo to see how it's presently done for wprov.
-Adam
On Wed, Aug 26, 2015 at 4:00 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Wed, Aug 26, 2015 at 3:58 PM, Chad Horohoe chorohoe@wikimedia.org wrote:
Aren't we uncached here anyway? Special pages and all.
-Chad
Actually the events we are recording here measure the users interaction with the pages they found. The current idea is to add a query parameter to all search results (only for users in the test, from javascript) so that when they land on that page we know they came from search and start collecting interaction events.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Hi!
We couldn't come up with a solution when discussing this with Erik, hence this thread.
I wonder if this can not be solved by some kind of redirect configuration, i.e. the initial URL U is modified to U' so that when U' is accessed, it is recorded and then redirected to cacheable URL U. This way, we have the record of U' being accessed, thus having the it available for analytics, but U is still served from cache. That would, of course, work best if we could make the frontend servers do the redirect, and one redirect should be much faster than loading non-cached page. I'm not sure which form of U' would be the best, or if it is feasible, just putting it out for discussion.
Max Semenik, 26/08/2015 22:30:
It does that by appending fromsearch=1 to links for 0.5% of users.
Hi all,
in determining where a user is coming from, the "referer" header would be much more advisable in general, but maybe I'm losing the context here.
Also, I think this thread should be x-posted to ops@ and/or relevant tickets should be brought to our attention :)
Cheers
Joe
On Thu, Aug 27, 2015 at 9:13 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Max Semenik, 26/08/2015 22:30:
It does that by appending fromsearch=1 to links for 0.5% of users.
See also https://phabricator.wikimedia.org/T106594
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
(cross-posting to ops@ as requested. This is in regards to an EventLogging schema[1] applied in javascript to track user behavior on pages they find via internal search to measure the quality of the search results[2][3]. )
The referrer would be nice to use, but we are trying to track more than just a single click away from Special:Search. We are trying to track all pages the user visits (for 10 minutes) by clicking from search results to one page and then another. The other difficulty here is we are trying to track only the pages the user clicked to coming from our search, and not a page they found by repeating their search on google and clicking through to Wikipedia.
Using the wprov[4] query parameter to declare the provenance as desktop search should avoid all the issues mentioned above. Varnish strips wprov so the varnish cache remains unfragmented. The javascript can pickup on the parameter and log the necessary events.
[1] https://meta.wikimedia.org/wiki/Schema:TestSearchSatisfaction2 [2] https://phabricator.wikimedia.org/T109482 [3] https://gerrit.wikimedia.org/r/#/c/232896/ [4] https://wikitech.wikimedia.org/wiki/Provenance
On Thu, Aug 27, 2015 at 3:10 AM, Giuseppe Lavagetto < glavagetto@wikimedia.org> wrote:
Hi all,
in determining where a user is coming from, the "referer" header would be much more advisable in general, but maybe I'm losing the context here.
Also, I think this thread should be x-posted to ops@ and/or relevant tickets should be brought to our attention :)
Cheers
Joe
On Thu, Aug 27, 2015 at 9:13 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Max Semenik, 26/08/2015 22:30:
It does that by appending fromsearch=1 to links for 0.5% of users.
See also https://phabricator.wikimedia.org/T106594
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
wikimedia-search@lists.wikimedia.org