I have to admit that I haven't read all of this rather lengthy thread, but
why wouldn't we just track this with EventLogging? I think a good usage of event logging is tracking "events", not pageviews. We do not need a capsule+ schema+ validation system to be able to count pageviews. Plain requests would work fine, is a lot simpler use case.
On Thu, Feb 5, 2015 at 3:16 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Bandwidth, I imagine? 25M events is a lot of events on top of the existing throughput.
On 5 February 2015 at 18:13, Ryan Kaldari rkaldari@wikimedia.org wrote:
I have to admit that I haven't read all of this rather lengthy thread,
but
why wouldn't we just track this with EventLogging? That would avoid all
the
pitfalls of other possible solutions: dealing with caching, creating
bogus
extra file requests, etc.
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org
wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to
the
GLAM folks, so yes, I think this needs to be added eventually. Let's
leave
it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org
wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop
based
solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need
to
add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table
(presumably
they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote: > Oliver, this is not about pageviews, but about media file views. > > > > These will be collected and dumped separately, as per > > >
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
> . > > > > Erik > > > > > > From: analytics-bounces@lists.wikimedia.org > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria > Ruiz > Sent: Wednesday, February 04, 2015 22:28 > To: A mailing list for the Analytics Team at WMF and everybody who > has > an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] Virtual file view hack for Media Viewer > views > > > >>We would add a rule to Vagrant to make sure it does not try to
look
>> up >> such >> requests in Swift but returns a 404 immediately. > > I bet ops would like it a lot better if this is a 204 and it kind
of
> makes > sense as it is the code used for beacons and such. Otherwise they > might > get > alarms on 404s increasing. > > > > > > > > > > > > > > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
okeyes@wikimedia.org>
> wrote: > > Not really; the new pageviews definition wouldn't include those > files > anyway. It seems silly, thought, be deliberately generating a
large
> amount of automated noise and client requests for this :/. > > > On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org > wrote: >> Hi all, >> >> Erik Zachte is working on file view stats and is looking for a
way
>> to >> track >> Media Viewer image views (for which there is no 1:1 relation >> between >> server >> hits and actual image views); after some back and forth in >> https://phabricator.wikimedia.org/T86914 I proposed the
following
>> hack: >> >> whenever the javascript code in MediaViewer determines that an >> image >> view >> happened (e.g. an image has been displayed for a certain amount
of
>> time), >> it >> makes a request to a certain fake image, say >> >> >>
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
>> image name>/<size>px-thumbnail.<ext> . These hits can than be >> easily >> filtered from the varnish request logs and added to the normal >> requests. >> We >> would add a rule to Vagrant to make sure it does not try to look
up
>> such >> requests in Swift but returns a 404 immediately. >> >> This would be a temporary workaround until there is a proper way
to
>> log >> virtual image views, such as EventLogging with a non-SQL backend. >> >> Do you see any fundamental problem with this? >> > >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics