why wouldn't we just track this with EventLogging?
I think a good usage of event logging is tracking "events", not pageviews.
We do not need a capsule+ schema+ validation system to be able to count
pageviews. Plain requests would work fine, is a lot simpler use case.
On Thu, Feb 5, 2015 at 3:16 PM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
Bandwidth, I imagine? 25M events is a lot of events on
top of the
existing throughput.
On 5 February 2015 at 18:13, Ryan Kaldari <rkaldari(a)wikimedia.org> wrote:
I have to admit that I haven't read all of
this rather lengthy thread,
but
why wouldn't we just track this with
EventLogging? That would avoid all
the
pitfalls of other possible solutions: dealing
with caching, creating
bogus
extra file requests, etc.
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tnegrin(a)wikimedia.org>
wrote:
>
> It turns out that the media viewer (on desktop; don't know about mobile)
> does a lot of caching so just because an image is loaded from swift, it
> doesn't mean it is viewed. We'd like to provide more accurate stats to
the
> GLAM folks, so yes, I think this needs to be
added eventually. Let's
leave
> it out of scope for now.
>
> -Toby
>
> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
>>
>> We want to include these files in the pageview definition? :/.
>>
>> My point was more that we should try to avoid traffic-generating
>> requests that exist solely as a hack for analytics purposes; it's
>> artificial work for both users and us. If this is the only way of
>> doing things that's totally fine.
>>
>> On 5 February 2015 at 11:38, Toby Negrin <tnegrin(a)wikimedia.org>
wrote:
>> > Hi Gergo -- I like this idea. As
far as capacity, any EL-Hadoop
based
>> > solution would be basically doing
the same thing as you propose.
>> >
>> > Can you please run it past ops (especially the 404 v 204) part?
>> >
>> > Oliver -- the issue is that we'd like to figure out a way to provide
>> > accurate views of the media files; because of client side caching, we
>> > can't
>> > use the current requests. But your point is a good one -- we'll need
to
>> > add
>> > this to the PV definition.
>> >
>> > -Toby
>> >
>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <okeyes(a)wikimedia.org>
>> > wrote:
>> >>
>> >> A nice theory, but if they appear in the webrequest table
(presumably
>> >> they would, and we're not
creating an entirely new set of varnishes
>> >> for the transmission of dummy images?) they have to be factored in.
>> >> Again, however, the new definition automatically filters them by
>> >> checking the webrequest source and MIME type, so this is not a
>> >> problem, as I originally stated.
>> >>
>> >> On 5 February 2015 at 08:10, Erik Zachte <ezachte(a)wikimedia.org>
>> >> wrote:
>> >> > Oliver, this is not about pageviews, but about media file views.
>> >> >
>> >> >
>> >> >
>> >> > These will be collected and dumped separately, as per
>> >> >
>> >> >
>> >> >
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_coun…
>> >> > .
>> >> >
>> >> >
>> >> >
>> >> > Erik
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > From: analytics-bounces(a)lists.wikimedia.org
>> >> > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
>> >> > Ruiz
>> >> > Sent: Wednesday, February 04, 2015 22:28
>> >> > To: A mailing list for the Analytics Team at WMF and everybody who
>> >> > has
>> >> > an
>> >> > interest in Wikipedia and analytics.
>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
>> >> > views
>> >> >
>> >> >
>> >> >
>> >> >>We would add a rule to Vagrant to make sure it does not try to
look
>> >> >> up
>> >> >> such
>> >> >> requests in Swift but returns a 404 immediately.
>> >> >
>> >> > I bet ops would like it a lot better if this is a 204 and it kind
of
>> >> > makes
>> >> > sense as it is the code used for beacons and such. Otherwise they
>> >> > might
>> >> > get
>> >> > alarms on 404s increasing.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
okeyes(a)wikimedia.org>
>> >> > wrote:
>> >> >
>> >> > Not really; the new pageviews definition wouldn't include
those
>> >> > files
>> >> > anyway. It seems silly, thought, be deliberately generating a
large
>> >> > amount of automated noise
and client requests for this :/.
>> >> >
>> >> >
>> >> > On 4 February 2015 at 15:00, Gergo Tisza
<gtisza(a)wikimedia.org>
>> >> > wrote:
>> >> >> Hi all,
>> >> >>
>> >> >> Erik Zachte is working on file view stats and is looking for a
way
>> >> >> to
>> >> >> track
>> >> >> Media Viewer image views (for which there is no 1:1 relation
>> >> >> between
>> >> >> server
>> >> >> hits and actual image views); after some back and forth in
>> >> >>
https://phabricator.wikimedia.org/T86914 I proposed the
following
>> >> >> hack:
>> >> >>
>> >> >> whenever the javascript code in MediaViewer determines that an
>> >> >> image
>> >> >> view
>> >> >> happened (e.g. an image has been displayed for a certain amount
of
>> >> >> time),
>> >> >> it
>> >> >> makes a request to a certain fake image, say
>> >> >>
>> >> >>
>> >> >>
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
>> >> >> image
name>/<size>px-thumbnail.<ext> . These hits can than be
>> >> >> easily
>> >> >> filtered from the varnish request logs and added to the normal
>> >> >> requests.
>> >> >> We
>> >> >> would add a rule to Vagrant to make sure it does not try to
look
up
>> >> >> such
>> >> >> requests in Swift but returns a 404 immediately.
>> >> >>
>> >> >> This would be a temporary workaround until there is a proper
way
to
>> >> log
>> >> virtual image views, such as EventLogging with a non-SQL backend.
>> >>
>> >> Do you see any fundamental problem with this?
>> >>
>> >
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> Analytics(a)lists.wikimedia.org
>> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >
>> >
>> >
>> > --
>> > Oliver Keyes
>> > Research Analyst
>> > Wikimedia Foundation
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics(a)lists.wikimedia.org
>> >
https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics(a)lists.wikimedia.org
>> >
https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics