Re: [Analytics] Virtual file view hack for Media Viewer views

5 Feb 2015

I'm not sure why a beacon would have to be a dummy html file, thus confusing PV
stats.

Could it not be a dummy image request, more in line with the one pixel images that are
often used.

This way Oliver can relax, go on vacation for real, without keeping a close watch over PV
definitions.

From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of Dan Andreescu
Sent: Thursday, February 05, 2015 22:43
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] Virtual file view hack for Media Viewer views

Nuria & Erik: you're totally right, I keep forgetting this problem is more
complicated than I think.

So we should figure out how this statsv magic thing works and see if we can use it here.

On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz &lt;nuria(a)wikimedia.org&gt; wrote:

...
 [Oliver] My point was more that we should try to avoid
traffic-generating 
...
 [Oliver] requests that exist solely as a hack for
analytics purposes; 
...
 [Dan] Is this a potential solution to Oliver's
concern: 

I disagree we should be concern about "beacons" to identify preloads, just like
beacons exist for ads or stats using one to identify preloads doesn't seem far fetched
(certainly I have used similar code before and it did its job). 

Note that EL works in a similar fashion requesting a "fake" image to varnish to
which we answer with a 204. It is very similar and the reason why we have such a code is
that we do not have a specific endpoint or domain where requests of this type could go.
Everything requested by our users and ourselves ends up in varnish pretty much.

On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu &lt;dandreescu(a)wikimedia.org&gt; wrote:

Is this a potential solution to Oliver's concern:

For "real" image views, add an X-Analytics header value of
"real-view=true" to the request itself?

If that's not feasible, we should look into using statsv for this (not sure how that
works) or having this be a different kafka topic and not consumed into HDFS.

On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin &lt;tnegrin(a)wikimedia.org&gt; wrote:

I created a card -- modify as desired:

https://trello.com/c/HMgVD4mz

-Toby

On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin &lt;tnegrin(a)wikimedia.org&gt; wrote:

It turns out that the media viewer (on desktop; don't know about mobile) does a lot of
caching so just because an image is loaded from swift, it doesn't mean it is viewed.
We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs
to be added eventually. Let's leave it out of scope for now.

-Toby

On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes &lt;okeyes(a)wikimedia.org&gt; wrote:

We want to include these files in the pageview definition? :/.

My point was more that we should try to avoid traffic-generating
requests that exist solely as a hack for analytics purposes; it's
artificial work for both users and us. If this is the only way of
doing things that's totally fine.

On 5 February 2015 at 11:38, Toby Negrin &lt;tnegrin(a)wikimedia.org&gt; wrote:
...
  Hi Gergo -- I like this idea.  As far as capacity, any
EL-Hadoop based
 solution would be basically doing the same thing as you propose.

 Can you please run it past ops (especially the 404 v 204) part?

 Oliver -- the issue is that we'd like to figure out a way to provide
 accurate views of the media files; because of client side caching, we can't
 use the current requests. But your point is a good one -- we'll need to add
 this to the PV definition.

 -Toby

 On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes &lt;okeyes(a)wikimedia.org&gt; wrote:

 A nice theory, but if they appear in the webrequest table (presumably
 they would, and we're not creating an entirely new set of varnishes
 for the transmission of dummy images?) they have to be factored in.
 Again, however, the new definition automatically filters them by
 checking the webrequest source and MIME type, so this is not a
 problem, as I originally stated.

 On 5 February 2015 at 08:10, Erik Zachte &lt;ezachte(a)wikimedia.org&gt; wrote:
  Oliver, this is not about pageviews, but about
media file views.

 These will be collected and dumped separately, as per

 https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_coun…
 .

 Erik

 From: analytics-bounces(a)lists.wikimedia.org
 [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz
 Sent: Wednesday, February 04, 2015 22:28
 To: A mailing list for the Analytics Team at WMF and everybody who has
 an
 interest in Wikipedia and analytics.
 Subject: Re: [Analytics] Virtual file view hack for Media Viewer views

 We would add a rule to Vagrant to make sure it
does not try to look up
 such
 requests in Swift but returns a 404 immediately. 
 I bet ops would like it a lot better if this is a 204 and it kind of
 makes
 sense as it is the code used for beacons and such. Otherwise they might
 get
 alarms on 404s increasing.

 On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes &lt;okeyes(a)wikimedia.org&gt;
 wrote:

 Not really; the new pageviews definition wouldn't include those files
 anyway. It seems silly, thought, be deliberately generating a large
 amount of automated noise and client requests for this :/.

 On 4 February 2015 at 15:00, Gergo Tisza &lt;gtisza(a)wikimedia.org&gt; wrote:
  Hi all,

 Erik Zachte is working on file view stats and is looking for a way to
 track
 Media Viewer image views (for which there is no 1:1 relation between
 server
 hits and actual image views); after some back and forth in
 https://phabricator.wikimedia.org/T86914 I proposed the following hack:

 whenever the javascript code in MediaViewer determines that an image
 view
 happened (e.g. an image has been displayed for a certain amount of
 time),
 it
 makes a request to a certain fake image, say

 upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
 image name>/<size>px-thumbnail.<ext> . These hits can than be easily
 filtered from the varnish request logs and added to the normal
 requests.
 We
 would add a rule to Vagrant to make sure it does not try to look up
 such
 requests in Swift but returns a 404 immediately.

 This would be a temporary workaround until there is a proper way to log
 virtual image views, such as EventLogging with a non-SQL backend.

 Do you see any fundamental problem with this?

  _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics 

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Virtual file view hack for Media Viewer views