>I have to admit that I haven't read all of this rather lengthy thread, but why wouldn't we just track this with EventLogging?
I think a good usage of event logging is tracking "events",  not pageviews. We do not need a capsule+ schema+ validation system to be able to count pageviews. Plain requests would work fine, is a lot simpler use case. 


On Thu, Feb 5, 2015 at 3:16 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Bandwidth, I imagine? 25M events is a lot of events on top of the
existing throughput.

On 5 February 2015 at 18:13, Ryan Kaldari <rkaldari@wikimedia.org> wrote:
> I have to admit that I haven't read all of this rather lengthy thread, but
> why wouldn't we just track this with EventLogging? That would avoid all the
> pitfalls of other possible solutions: dealing with caching, creating bogus
> extra file requests, etc.
>
> On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tnegrin@wikimedia.org> wrote:
>>
>> It turns out that the media viewer (on desktop; don't know about mobile)
>> does a lot of caching so just because an image is loaded from swift, it
>> doesn't mean it is viewed. We'd like to provide more accurate stats to the
>> GLAM folks, so yes, I think this needs to be added eventually. Let's leave
>> it out of scope for now.
>>
>> -Toby
>>
>> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>>
>>> We want to include these files in the pageview definition? :/.
>>>
>>> My point was more that we should try to avoid traffic-generating
>>> requests that exist solely as a hack for analytics purposes; it's
>>> artificial work for both users and us. If this is the only way of
>>> doing things that's totally fine.
>>>
>>> On 5 February 2015 at 11:38, Toby Negrin <tnegrin@wikimedia.org> wrote:
>>> > Hi Gergo -- I like this idea.  As far as capacity, any EL-Hadoop based
>>> > solution would be basically doing the same thing as you propose.
>>> >
>>> > Can you please run it past ops (especially the 404 v 204) part?
>>> >
>>> > Oliver -- the issue is that we'd like to figure out a way to provide
>>> > accurate views of the media files; because of client side caching, we
>>> > can't
>>> > use the current requests. But your point is a good one -- we'll need to
>>> > add
>>> > this to the PV definition.
>>> >
>>> > -Toby
>>> >
>>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <okeyes@wikimedia.org>
>>> > wrote:
>>> >>
>>> >> A nice theory, but if they appear in the webrequest table (presumably
>>> >> they would, and we're not creating an entirely new set of varnishes
>>> >> for the transmission of dummy images?) they have to be factored in.
>>> >> Again, however, the new definition automatically filters them by
>>> >> checking the webrequest source and MIME type, so this is not a
>>> >> problem, as I originally stated.
>>> >>
>>> >> On 5 February 2015 at 08:10, Erik Zachte <ezachte@wikimedia.org>
>>> >> wrote:
>>> >> > Oliver, this is not about pageviews, but about media file views.
>>> >> >
>>> >> >
>>> >> >
>>> >> > These will be collected and dumped separately, as per
>>> >> >
>>> >> >
>>> >> > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
>>> >> > .
>>> >> >
>>> >> >
>>> >> >
>>> >> > Erik
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > From: analytics-bounces@lists.wikimedia.org
>>> >> > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
>>> >> > Ruiz
>>> >> > Sent: Wednesday, February 04, 2015 22:28
>>> >> > To: A mailing list for the Analytics Team at WMF and everybody who
>>> >> > has
>>> >> > an
>>> >> > interest in Wikipedia and analytics.
>>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
>>> >> > views
>>> >> >
>>> >> >
>>> >> >
>>> >> >>We would add a rule to Vagrant to make sure it does not try to look
>>> >> >> up
>>> >> >> such
>>> >> >> requests in Swift but returns a 404 immediately.
>>> >> >
>>> >> > I bet ops would like it a lot better if this is a 204 and it kind of
>>> >> > makes
>>> >> > sense as it is the code used for beacons and such. Otherwise they
>>> >> > might
>>> >> > get
>>> >> > alarms on 404s increasing.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <okeyes@wikimedia.org>
>>> >> > wrote:
>>> >> >
>>> >> > Not really; the new pageviews definition wouldn't include those
>>> >> > files
>>> >> > anyway. It seems silly, thought, be deliberately generating a large
>>> >> > amount of automated noise and client requests for this :/.
>>> >> >
>>> >> >
>>> >> > On 4 February 2015 at 15:00, Gergo Tisza <gtisza@wikimedia.org>
>>> >> > wrote:
>>> >> >> Hi all,
>>> >> >>
>>> >> >> Erik Zachte is working on file view stats and is looking for a way
>>> >> >> to
>>> >> >> track
>>> >> >> Media Viewer image views (for which there is no 1:1 relation
>>> >> >> between
>>> >> >> server
>>> >> >> hits and actual image views); after some back and forth in
>>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the following
>>> >> >> hack:
>>> >> >>
>>> >> >> whenever the javascript code in MediaViewer determines that an
>>> >> >> image
>>> >> >> view
>>> >> >> happened (e.g. an image has been displayed for a certain amount of
>>> >> >> time),
>>> >> >> it
>>> >> >> makes a request to a certain fake image, say
>>> >> >>
>>> >> >>
>>> >> >> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
>>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be
>>> >> >> easily
>>> >> >> filtered from the varnish request logs and added to the normal
>>> >> >> requests.
>>> >> >> We
>>> >> >> would add a rule to Vagrant to make sure it does not try to look up
>>> >> >> such
>>> >> >> requests in Swift but returns a 404 immediately.
>>> >> >>
>>> >> >> This would be a temporary workaround until there is a proper way to
>>> >> >> log
>>> >> >> virtual image views, such as EventLogging with a non-SQL backend.
>>> >> >>
>>> >> >> Do you see any fundamental problem with this?
>>> >> >>
>>> >> >
>>> >> >> _______________________________________________
>>> >> >> Analytics mailing list
>>> >> >> Analytics@lists.wikimedia.org
>>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Oliver Keyes
>>> >> > Research Analyst
>>> >> > Wikimedia Foundation
>>> >> >
>>> >> > _______________________________________________
>>> >> > Analytics mailing list
>>> >> > Analytics@lists.wikimedia.org
>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > Analytics mailing list
>>> >> > Analytics@lists.wikimedia.org
>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Oliver Keyes
>>> >> Research Analyst
>>> >> Wikimedia Foundation
>>> >>
>>> >> _______________________________________________
>>> >> Analytics mailing list
>>> >> Analytics@lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Analytics mailing list
>>> > Analytics@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >
>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics