Hi Gergo -- I like this idea.  As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.

Can you please run it past ops (especially the 404 v 204) part?

Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.

-Toby

On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
A nice theory, but if they appear in the webrequest table (presumably
they would, and we're not creating an entirely new set of varnishes
for the transmission of dummy images?) they have to be factored in.
Again, however, the new definition automatically filters them by
checking the webrequest source and MIME type, so this is not a
problem, as I originally stated.

On 5 February 2015 at 08:10, Erik Zachte <ezachte@wikimedia.org> wrote:
> Oliver, this is not about pageviews, but about media file views.
>
>
>
> These will be collected and dumped separately, as per
> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
> .
>
>
>
> Erik
>
>
>
>
>
> From: analytics-bounces@lists.wikimedia.org
> [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz
> Sent: Wednesday, February 04, 2015 22:28
> To: A mailing list for the Analytics Team at WMF and everybody who has an
> interest in Wikipedia and analytics.
> Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
>
>
>
>>We would add a rule to Vagrant to make sure it does not try to look up such
>> requests in Swift but returns a 404 immediately.
>
> I bet ops would like it a lot better if this is a 204 and it kind of makes
> sense as it is the code used for beacons and such. Otherwise they might get
> alarms on 404s increasing.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>
> Not really; the new pageviews definition wouldn't include those files
> anyway. It seems silly, thought, be deliberately generating a large
> amount of automated noise and client requests for this :/.
>
>
> On 4 February 2015 at 15:00, Gergo Tisza <gtisza@wikimedia.org> wrote:
>> Hi all,
>>
>> Erik Zachte is working on file view stats and is looking for a way to
>> track
>> Media Viewer image views (for which there is no 1:1 relation between
>> server
>> hits and actual image views); after some back and forth in
>> https://phabricator.wikimedia.org/T86914 I proposed the following hack:
>>
>> whenever the javascript code in MediaViewer determines that an image view
>> happened (e.g. an image has been displayed for a certain amount of time),
>> it
>> makes a request to a certain fake image, say
>> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
>> image name>/<size>px-thumbnail.<ext> . These hits can than be easily
>> filtered from the varnish request logs and added to the normal requests.
>> We
>> would add a rule to Vagrant to make sure it does not try to look up such
>> requests in Swift but returns a 404 immediately.
>>
>> This would be a temporary workaround until there is a proper way to log
>> virtual image views, such as EventLogging with a non-SQL backend.
>>
>> Do you see any fundamental problem with this?
>>
>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics