Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

19 Jan 2018

...
   For virtual pageviews, people will probably be more
interested in reports that belong to the first group (summing them up with normal
pageviews, breaking them down along the dimensions that are relevant for
web traffic, counting them for a given URL etc).

Ah! Ok I get this use case now.   I might not be able to comment about this
much then.  I think this totally changes the meaning of a pageview.
Perhaps this is what you want?  If so, this is outside the realm of my
opinionatedness. :)

However, IF you do convince folks to change the meaning of ‘pageview’ to
include ‘previews’, then we might be able to compromise.  All I object to
more filtering of webrequests :)  The rest of this email might be moot if
we don’t change the ‘pageview definition’, but I’ll continue anyway…

The page previews data could come in as events.  Augmenting the generated
pageviews table from more incoming event sources sounds more flexible than
doing more filtering logic in webrequests.  I’d defer to the Analytics team
members who would be implementing this though, I might be wrong.

In my ideal, pageviews and page_previews would both be separate event
streams.  These would be imported as is to Hive tables, but also available
in Kafka.  You could join these together in a broader ‘content consumption’
dataset somehow, either in Hadoop with batch jobs, or more realtime with
streaming jobs.  (If this is done right, you can even use the same code for
both cases.)  If we had a good stream processing system here, I might
suggest that we move pageview filtering to a more realtime setup and
generate a derived pageview stream in Kafka. We’d then that as the source
of pageviews in Hadoop.   Anyway, this is my ideal setup, but not what we
have now!  But we might one day (in the next FY???), and intaking events
for page previews and other counters will help us migrate to this kind
of architecture later.

...
  Is that different from preprocessing them via
EventLogging? Either way you take a HTTP request, and end up with a Hadoop record -
is there
something that makes that process a lot more costly for normal pageviews
than EventLogging beacon hits?

...
 From a hardware perspective, only in that the stream of
events is much smaller, so there’s less wasted repeated I/O.  From a engineering
time
perspective, if we use the webrequest tagging system to do this, I think
we’re good, but only in the short term.  In the long term, it hides the
complexity involved in maintaining the logic of what a pageview or page
preview or any other ‘tagged’ webrequest in complicated Java logic that is
really only useable in Hadoop.  I’m mainly objecting because we want to
draw a line to stop doing this kind of thing.  Doing this for page previews
now might be ok if we really really really have to (although Nuria might
not agree ;) ), but ultimately we need to push this kind of interaction
logic out to feature developers who have more control over it.

The Analytics team wants to build infrastructure that make it easy for
developers to measure their product usage, not implement the measuring
logic ourselves.

On Fri, Jan 19, 2018 at 6:05 AM, Adam Baso &lt;abaso(a)wikimedia.org&gt; wrote:

...
  Thanks, Sam. Nuria, that's what I was getting at -
if using the EL JS
 library would some sort of new method be needed so that these impressions
 arena't undercounted?

 On Fri, Jan 19, 2018 at 4:49 AM, Sam Smith &lt;samsmith(a)wikimedia.org&gt; wrote:

  On Thu, Jan 18, 2018 at 9:57 PM, Adam Baso
&lt;abaso(a)wikimedia.org&gt; wrote:

  Adding to this, one thing to consider is DNT - is
there a way to invoke
 EL so that such traffic is appropriately imputed or something?

 The EventLogging client respects DNT [0]. When the user enables DNT,
 mw.eventLog.logEvent is a NOP.

 I don't see any mention of DNT in the Varnish VCLs around the the /beacon
 endpoint or otherwise but it may be handled elsewhere. While it's unlikely,
 there's nothing stopping a client sending a well-formatted request to the
 /beacon/event endpoint directly [1], ignoring the user's choice.

 -Sam

 [0] https://phabricator.wikimedia.org/diffusion/EEVL/browse/
 master/modules/ext.eventLogging.core.js;4480f7e27140fcb8ae91
 5c1755223fd7a5bab9b9$251
 [1] https://phabricator.wikimedia.org/diffusion/EEVL/browse/mast
 er/modules/ext.eventLogging.core.js;4480f7e27140fcb8ae915c
 1755223fd7a5bab9b9$215

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews