Dan,
The issue is that MediaViewer always sends two images, the one a user clicked on and the
next in that same article. The second image might be shown to the user later when she
clicks the right arrow, at which point a new image is prefetched. So only later it will
become clear if the image is actually shown to the user.
We thought of differentiating between explicitly asked first image and implicitly sent
follow-up images by adding a new x-analytics parm for prefetched images (and just plainly
ignore those), but that would harm our server cache, as two versions of same image would
be stored due to slightly different urls.
Several variations are still under discussion, different moments to send a beacon from the
client, or add a hook in php.
One thing I haven't proposed yet is to patch the server cache code so that it ignores
that extra argument when it decides if the cache needs updating, but still logs the
original url. But since I'm not familiar with that environment and we have so many
ideas still under review, I'll just drop it here. J And who knows what side effects
that would have.
Erik
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of Dan Andreescu
Sent: Thursday, February 05, 2015 21:46
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of
"real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that
works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:
I created a card -- modify as desired:
https://trello.com/c/HMgVD4mz
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of
caching so just because an image is loaded from swift, it doesn't mean it is viewed.
We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs
to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating
requests that exist solely as a hack for analytics purposes; it's
artificial work for both users and us. If this is the only way of
doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin <tnegrin(a)wikimedia.org> wrote:
Hi Gergo -- I like this idea. As far as capacity, any
EL-Hadoop based
solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide
accurate views of the media files; because of client side caching, we can't
use the current requests. But your point is a good one -- we'll need to add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
A nice theory, but if they appear in the webrequest table (presumably
they would, and we're not creating an entirely new set of varnishes
for the transmission of dummy images?) they have to be factored in.
Again, however, the new definition automatically filters them by
checking the webrequest source and MIME type, so this is not a
problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte <ezachte(a)wikimedia.org> wrote:
Oliver, this is not about pageviews, but about
media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_coun…
.
Erik
From: analytics-bounces(a)lists.wikimedia.org
[mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz
Sent: Wednesday, February 04, 2015 22:28
To: A mailing list for the Analytics Team at WMF and everybody who has
an
interest in Wikipedia and analytics.
Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it
does not try to look up
such
requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of
makes
sense as it is the code used for beacons and such. Otherwise they might
get
alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
Not really; the new pageviews definition wouldn't include those files
anyway. It seems silly, thought, be deliberately generating a large
amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza <gtisza(a)wikimedia.org> wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to
track
Media Viewer image views (for which there is no 1:1 relation between
server
hits and actual image views); after some back and forth in
https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image
view
happened (e.g. an image has been displayed for a certain amount of
time),
it
makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
image name>/<size>px-thumbnail.<ext> . These hits can than be easily
filtered from the varnish request logs and added to the normal
requests.
We
would add a rule to Vagrant to make sure it does not try to look up
such
requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log
virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics