Thank you, Tilman. This is very helpful.

Leila


On Thu, Feb 8, 2018 at 1:50 AM, Tilman Bayer <tbayer@wikimedia.org> wrote:
Hi Leila,

On Wed, Jan 17, 2018 at 10:46 AM, Leila Zia <leila@wikimedia.org> wrote:
Hi Sam,

On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith <samsmith@wikimedia.org> wrote:

> IMO #1 is preferable from the operations and performance perspectives as the
> response is always served from the edge and includes very few headers,
> whereas the request in #2 may be served by the application servers if the
> user is logged in (or in the mobile site's beta cohort). However, the
> requests in #2 are already

It seems the sentence above is cut, can you resend it?

> We're currently considering recording page interactions when previews are
> open for longer than 1000 ms. We estimate that this would increase overall
> web requests by 0.3% [3].

Can you say some words about how the 1000 ms threshold is chosen?
This is a good question, sorry that it got buried earlier. (It's kind of orthogonal though to the technical instrumentation questions that have been the focus of attention: as indicated by the capital X in Sam's initial post, we can still decide to fine-tune that threshold right now, it's just a parameter change.)

This kind of threshold necessarily needs to be set somewhat arbitrarily, in the sense that there will always be either cases where some content was already read/perceived in a preview card shown for a shorter time, or cases where a reader needed a longer time to consume any content from the card. We picked a time by which we can be reasonably certain that at least some readers can consume content (read some words, perceive an image). It's not the result of an exact calculation to find the provably best limit. But we did have look at the frequency of the different user actions over time during the first seconds after they start to hover over a link. In case you're interested, I recently updated those charts with better quality data from our latest two tests, e.g: 
https://phabricator.wikimedia.org/F13134460 (a zoomed-in look at the same histogram)

The following is just eyeballing and thinking aloud, but one way to view this histogram is as the sum of several distributions associated with different user intentions:
1. Most of the time when our instrumentation registered the cursor moving over a link, the user was just on their way to a different part of the screen (with no intention of either clicking that link or viewing the preview). That's mostly the huge yellow spike on the left - "dwelledButAbandoned" meaning that the cursor left the link without either clicking it or causing a preview to show. The feature involves a 500ms delay before the preview card begins to display, so that we don't bother that group too much. (Only the right tail end of that distribution, folks moving the cursor very slowly, will be affected, where things morph from yellow into purple.) 
2. Then there are users who want to click the link without viewing the preview, forming all of the green part left of 500ms and an unknown portion to the right of it (after the card starts to show, some of these "open" actions will instead happen after the user intentionally viewed the card, case 3.).
3. And there are users who intentionally view a preview. The little bump in the purple part ("dismissed" meaning that the preview was shown and then closed by moving the cursor away) at about 1100ms indicates that the distribution for that user group also peaks somewhere there, maybe a few 100ms to the right. That would mean that our 1000ms threshold (i.e. only counting the part of the histogram right of 1500ms = 500ms + 1000ms as seen previews) is actually right of that distribution's peak. I.e. that the threshold is in some sense quite conservative.

Like I said, this is all of course still a bit handwavy; it involves some assumptions about the form of these distributions, as well as disregarding some other information for now that can give a fuller picture (in particular the analogous histogram for link interaction behavior without page previews being active, which we also have from our A/B tests). 
 
Is
this based (partially) on looking at traces where a user-agent goes to
a page and returns to the "source" article?
We did an analysis of that user behavior, but not regarding the timing question; rather, it was about finding out how much of the reduction in pageviews comes from reduced usage of the back button. I'm not sure how directly we can compare the action of loading an entire new page and then going back (two clicks that also involve moving the mouse cursor to an entirely different part of the screen - the back button - inbetween) with the action of hovering over a link and then moving the cursor away for a small distance to close the preview; it seems to me that the latter involves much less friction - which is kind of the whole point of the previews feature ;) 

As indicated, we already picked a value for the threshold that we are quite comfortable with. But if you are still interested in this question and have some spare time, I'm more than happy to chat about it further off-list.


Thanks,
Leila

>
> [0] https://lists.wikimedia.org/pipermail/analytics/2015-March/003633.html
> [1]
> https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/varnish/templates/vcl/wikimedia-frontend.vcl.erb;1bce79d58e03bd02888beef986c41989e8345037$269
> [2] https://wikitech.wikimedia.org/wiki/X-Analytics
> [3] https://phabricator.wikimedia.org/T184793#3901365
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB