Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
We would add a rule to Vagrant to make sure it does not try to look up
such requests in Swift but returns a 404 immediately. I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to
track
Media Viewer image views (for which there is no 1:1 relation between
server
hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of
time), it
makes a request to a certain fake image, say upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-
<real
image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests.
We
would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... .
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... .
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up
such
requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of
makes
sense as it is the code used for beacons and such. Otherwise they might
get
alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org
wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image
view
happened (e.g. an image has been displayed for a certain amount of
time),
it makes a request to a certain fake image, say upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-
<real
image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... .
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we
can't
use the current requests. But your point is a good one -- we'll need to
add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
Ruiz
Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they
might
get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following
hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-
<real
image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to
log
virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we
can't
use the current requests. But your point is a good one -- we'll need to
add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org
wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
Ruiz
Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who
has
an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer
views
We would add a rule to Vagrant to make sure it does not try to look
up
such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they
might
get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way
to
track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following
hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to
log
virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we
can't
use the current requests. But your point is a good one -- we'll need
to add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org
wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
Ruiz
Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who
has
an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer
views
>We would add a rule to Vagrant to make sure it does not try to look
up
> such > requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they
might
get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <okeyes@wikimedia.org
wrote:
Not really; the new pageviews definition wouldn't include those
files
anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
> Hi all, > > Erik Zachte is working on file view stats and is looking for a way
to
> track > Media Viewer image views (for which there is no 1:1 relation
between
> server > hits and actual image views); after some back and forth in > https://phabricator.wikimedia.org/T86914 I proposed the following
hack:
> > whenever the javascript code in MediaViewer determines that an
image
> view > happened (e.g. an image has been displayed for a certain amount of > time), > it > makes a request to a certain fake image, say > >
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- <real
> image name>/<size>px-thumbnail.<ext> . These hits can than be
easily
> filtered from the varnish request logs and added to the normal > requests. > We > would add a rule to Vagrant to make sure it does not try to look up > such > requests in Swift but returns a 404 immediately. > > This would be a temporary workaround until there is a proper way
to log
> virtual image views, such as EventLogging with a non-SQL backend. > > Do you see any fundamental problem with this? >
> _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
IIRC, it's actually desirable to have these PVs in hadoop so we can run the queries in concert with mobile page views.
Erik Z -- thoughts?
-Toby
On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we
can't
use the current requests. But your point is a good one -- we'll need
to add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org
wrote:
> Oliver, this is not about pageviews, but about media file views. > > > > These will be collected and dumped separately, as per > >
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
> . > > > > Erik > > > > > > From: analytics-bounces@lists.wikimedia.org > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
Ruiz
> Sent: Wednesday, February 04, 2015 22:28 > To: A mailing list for the Analytics Team at WMF and everybody who
has
> an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
views
> > > >>We would add a rule to Vagrant to make sure it does not try to
look up
>> such >> requests in Swift but returns a 404 immediately. > > I bet ops would like it a lot better if this is a 204 and it kind
of
> makes > sense as it is the code used for beacons and such. Otherwise they
might
> get > alarms on 404s increasing. > > > > > > > > > > > > > > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
okeyes@wikimedia.org>
> wrote: > > Not really; the new pageviews definition wouldn't include those
files
> anyway. It seems silly, thought, be deliberately generating a large > amount of automated noise and client requests for this :/. > > > On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
>> Hi all, >> >> Erik Zachte is working on file view stats and is looking for a
way to
>> track >> Media Viewer image views (for which there is no 1:1 relation
between
>> server >> hits and actual image views); after some back and forth in >> https://phabricator.wikimedia.org/T86914 I proposed the
following hack:
>> >> whenever the javascript code in MediaViewer determines that an
image
>> view >> happened (e.g. an image has been displayed for a certain amount of >> time), >> it >> makes a request to a certain fake image, say >> >>
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- <real
>> image name>/<size>px-thumbnail.<ext> . These hits can than be
easily
>> filtered from the varnish request logs and added to the normal >> requests. >> We >> would add a rule to Vagrant to make sure it does not try to look
up
>> such >> requests in Swift but returns a 404 immediately. >> >> This would be a temporary workaround until there is a proper way
to log
>> virtual image views, such as EventLogging with a non-SQL backend. >> >> Do you see any fundamental problem with this? >> > >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Dan,
The issue is that MediaViewer always sends two images, the one a user clicked on and the next in that same article. The second image might be shown to the user later when she clicks the right arrow, at which point a new image is prefetched. So only later it will become clear if the image is actually shown to the user.
We thought of differentiating between explicitly asked first image and implicitly sent follow-up images by adding a new x-analytics parm for prefetched images (and just plainly ignore those), but that would harm our server cache, as two versions of same image would be stored due to slightly different urls.
Several variations are still under discussion, different moments to send a beacon from the client, or add a hook in php.
One thing I haven't proposed yet is to patch the server cache code so that it ignores that extra argument when it decides if the cache needs updating, but still logs the original url. But since I'm not familiar with that environment and we have so many ideas still under review, I'll just drop it here. J And who knows what side effects that would have.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dan Andreescu Sent: Thursday, February 05, 2015 21:46 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... .
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Since it's apparently coming across incorrectly, let me say it very very explicitly:
These images are not counted in the current pageview definition.
Even if you send dummy images, they will not be counted in the current pageview definition.
If these images are counted in the current pageview definition, I insist that you immediately either fire me for being so utterly stupid as to write a pageview definition that included them - or alternately that Erik give me Toby's job for being so incredibly smart that I got away with doing something so incredibly stupid, and snuck it past three levels of management successfully ;).
My concern is not pageview-related, my concern is simply me being stuffy and grumbling that we're making the client execute an extra request that takes non-null bandwidth (and making the server handle said requests) solely for GLAM purposes - noting that I accept that it is necessary /for/ those purposes, and that I am totally fine with us doing that if we can't see a smarter and less disruptive way of achieving the same thing. I am not asking people to not send these images, create a special class of varnish machines, or pulling my hair out because I wrote a pageview definition that will include such requests (see: above).
On 5 February 2015 at 15:46, Dan Andreescu dandreescu@wikimedia.org wrote:
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote: > Oliver, this is not about pageviews, but about media file views. > > > > These will be collected and dumped separately, as per > > > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... > . > > > > Erik > > > > > > From: analytics-bounces@lists.wikimedia.org > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria > Ruiz > Sent: Wednesday, February 04, 2015 22:28 > To: A mailing list for the Analytics Team at WMF and everybody who > has > an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] Virtual file view hack for Media Viewer > views > > > >>We would add a rule to Vagrant to make sure it does not try to look >> up >> such >> requests in Swift but returns a 404 immediately. > > I bet ops would like it a lot better if this is a 204 and it kind > of > makes > sense as it is the code used for beacons and such. Otherwise they > might > get > alarms on 404s increasing. > > > > > > > > > > > > > > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes > okeyes@wikimedia.org > wrote: > > Not really; the new pageviews definition wouldn't include those > files > anyway. It seems silly, thought, be deliberately generating a large > amount of automated noise and client requests for this :/. > > > On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org > wrote: >> Hi all, >> >> Erik Zachte is working on file view stats and is looking for a way >> to >> track >> Media Viewer image views (for which there is no 1:1 relation >> between >> server >> hits and actual image views); after some back and forth in >> https://phabricator.wikimedia.org/T86914 I proposed the following >> hack: >> >> whenever the javascript code in MediaViewer determines that an >> image >> view >> happened (e.g. an image has been displayed for a certain amount of >> time), >> it >> makes a request to a certain fake image, say >> >> >> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real >> image name>/<size>px-thumbnail.<ext> . These hits can than be >> easily >> filtered from the varnish request logs and added to the normal >> requests. >> We >> would add a rule to Vagrant to make sure it does not try to look >> up >> such >> requests in Swift but returns a 404 immediately. >> >> This would be a temporary workaround until there is a proper way >> to log >> virtual image views, such as EventLogging with a non-SQL backend. >> >> Do you see any fundamental problem with this? >> > >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Thu, Feb 5, 2015 at 1:09 PM, Oliver Keyes okeyes@wikimedia.org wrote:
My concern is not pageview-related, my concern is simply me being stuffy and grumbling that we're making the client execute an extra request that takes non-null bandwidth (and making the server handle said requests) solely for GLAM purposes - noting that I accept that it is necessary /for/ those purposes, and that I am totally fine with us doing that if we can't see a smarter and less disruptive way of achieving the same thing.
On a very generic level, accurately understanding user behavior (which is key for writing high-quality software and producing high-quality content) is not possible without client-side code having a way to send information to the server, and that will take extra requests and extra bandwidth. In the long term, there might be smarter ways which take less extra requests and less bandwidth (request batching, websockets, SPDY etc); file view stats tracking needs to be done soon-ish though and I don't think there is an easy and quick way which is less disruptive than the naive approach of generating fake request logs.
To put that disruptiveness in perspective, though, a random English Wikipedia page seems to be around 50K total traffic with a warm cache. A random image request by MediaViewer is maybe 100-200K. An empty request to log a virtual file view is a few hundred bytes so it will increase the traffic by ~0.1%. By the most inclusive definition, there are about 25M file views per day in MediaViewer; total server requests are in the range of 2B per month according to this slightly outdated stat http://stats.wikimedia.org/#requests so again the increase is in the range of ~0.1%.
[Oliver] My point was more that we should try to avoid traffic-generating [Oliver] requests that exist solely as a hack for analytics purposes; [Dan] Is this a potential solution to Oliver's concern:
I disagree we should be concern about "beacons" to identify preloads, just like beacons exist for ads or stats using one to identify preloads doesn't seem far fetched (certainly I have used similar code before and it did its job).
Note that EL works in a similar fashion requesting a "fake" image to varnish to which we answer with a 204. It is very similar and the reason why we have such a code is that we do not have a specific endpoint or domain where requests of this type could go. Everything requested by our users and ourselves ends up in varnish pretty much.
On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we
can't
use the current requests. But your point is a good one -- we'll need
to add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org
wrote:
> Oliver, this is not about pageviews, but about media file views. > > > > These will be collected and dumped separately, as per > >
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
> . > > > > Erik > > > > > > From: analytics-bounces@lists.wikimedia.org > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
Ruiz
> Sent: Wednesday, February 04, 2015 22:28 > To: A mailing list for the Analytics Team at WMF and everybody who
has
> an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
views
> > > >>We would add a rule to Vagrant to make sure it does not try to
look up
>> such >> requests in Swift but returns a 404 immediately. > > I bet ops would like it a lot better if this is a 204 and it kind
of
> makes > sense as it is the code used for beacons and such. Otherwise they
might
> get > alarms on 404s increasing. > > > > > > > > > > > > > > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
okeyes@wikimedia.org>
> wrote: > > Not really; the new pageviews definition wouldn't include those
files
> anyway. It seems silly, thought, be deliberately generating a large > amount of automated noise and client requests for this :/. > > > On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
>> Hi all, >> >> Erik Zachte is working on file view stats and is looking for a
way to
>> track >> Media Viewer image views (for which there is no 1:1 relation
between
>> server >> hits and actual image views); after some back and forth in >> https://phabricator.wikimedia.org/T86914 I proposed the
following hack:
>> >> whenever the javascript code in MediaViewer determines that an
image
>> view >> happened (e.g. an image has been displayed for a certain amount of >> time), >> it >> makes a request to a certain fake image, say >> >>
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- <real
>> image name>/<size>px-thumbnail.<ext> . These hits can than be
easily
>> filtered from the varnish request logs and added to the normal >> requests. >> We >> would add a rule to Vagrant to make sure it does not try to look
up
>> such >> requests in Swift but returns a 404 immediately. >> >> This would be a temporary workaround until there is a proper way
to log
>> virtual image views, such as EventLogging with a non-SQL backend. >> >> Do you see any fundamental problem with this? >> > >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Nuria & Erik: you're totally right, I keep forgetting this problem is more complicated than I think.
So we should figure out how this statsv magic thing works and see if we can use it here.
On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz nuria@wikimedia.org wrote:
[Oliver] My point was more that we should try to avoid traffic-generating [Oliver] requests that exist solely as a hack for analytics purposes; [Dan] Is this a potential solution to Oliver's concern:
I disagree we should be concern about "beacons" to identify preloads, just like beacons exist for ads or stats using one to identify preloads doesn't seem far fetched (certainly I have used similar code before and it did its job).
Note that EL works in a similar fashion requesting a "fake" image to varnish to which we answer with a 204. It is very similar and the reason why we have such a code is that we do not have a specific endpoint or domain where requests of this type could go. Everything requested by our users and ourselves ends up in varnish pretty much.
On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop
based
solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching,
we can't
use the current requests. But your point is a good one -- we'll need
to add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
> > A nice theory, but if they appear in the webrequest table
(presumably
> they would, and we're not creating an entirely new set of varnishes > for the transmission of dummy images?) they have to be factored in. > Again, however, the new definition automatically filters them by > checking the webrequest source and MIME type, so this is not a > problem, as I originally stated. > > On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org
wrote:
> > Oliver, this is not about pageviews, but about media file views. > > > > > > > > These will be collected and dumped separately, as per > > > >
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
> > . > > > > > > > > Erik > > > > > > > > > > > > From: analytics-bounces@lists.wikimedia.org > > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of
Nuria Ruiz
> > Sent: Wednesday, February 04, 2015 22:28 > > To: A mailing list for the Analytics Team at WMF and everybody
who has
> > an > > interest in Wikipedia and analytics. > > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
views
> > > > > > > >>We would add a rule to Vagrant to make sure it does not try to
look up
> >> such > >> requests in Swift but returns a 404 immediately. > > > > I bet ops would like it a lot better if this is a 204 and it kind
of
> > makes > > sense as it is the code used for beacons and such. Otherwise they
might
> > get > > alarms on 404s increasing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
okeyes@wikimedia.org>
> > wrote: > > > > Not really; the new pageviews definition wouldn't include those
files
> > anyway. It seems silly, thought, be deliberately generating a
large
> > amount of automated noise and client requests for this :/. > > > > > > On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
> >> Hi all, > >> > >> Erik Zachte is working on file view stats and is looking for a
way to
> >> track > >> Media Viewer image views (for which there is no 1:1 relation
between
> >> server > >> hits and actual image views); after some back and forth in > >> https://phabricator.wikimedia.org/T86914 I proposed the
following hack:
> >> > >> whenever the javascript code in MediaViewer determines that an
image
> >> view > >> happened (e.g. an image has been displayed for a certain amount
of
> >> time), > >> it > >> makes a request to a certain fake image, say > >> > >>
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- <real
> >> image name>/<size>px-thumbnail.<ext> . These hits can than be
easily
> >> filtered from the varnish request logs and added to the normal > >> requests. > >> We > >> would add a rule to Vagrant to make sure it does not try to look
up
> >> such > >> requests in Swift but returns a 404 immediately. > >> > >> This would be a temporary workaround until there is a proper way
to log
> >> virtual image views, such as EventLogging with a non-SQL backend. > >> > >> Do you see any fundamental problem with this? > >> > > > >> _______________________________________________ > >> Analytics mailing list > >> Analytics@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > > > > > > > -- > > Oliver Keyes > > Research Analyst > > Wikimedia Foundation > > > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I'm not sure why a beacon would have to be a dummy html file, thus confusing PV stats.
Could it not be a dummy image request, more in line with the one pixel images that are often used.
This way Oliver can relax, go on vacation for real, without keeping a close watch over PV definitions.
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dan Andreescu Sent: Thursday, February 05, 2015 22:43 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
Nuria & Erik: you're totally right, I keep forgetting this problem is more complicated than I think.
So we should figure out how this statsv magic thing works and see if we can use it here.
On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz nuria@wikimedia.org wrote:
[Oliver] My point was more that we should try to avoid traffic-generating
[Oliver] requests that exist solely as a hack for analytics purposes;
[Dan] Is this a potential solution to Oliver's concern:
I disagree we should be concern about "beacons" to identify preloads, just like beacons exist for ads or stats using one to identify preloads doesn't seem far fetched (certainly I have used similar code before and it did its job).
Note that EL works in a similar fashion requesting a "fake" image to varnish to which we answer with a 204. It is very similar and the reason why we have such a code is that we do not have a specific endpoint or domain where requests of this type could go. Everything requested by our users and ourselves ends up in varnish pretty much.
On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... .
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Erik,
Again: I am not, and at no point in this conversation have been, concerned about the pageview definition.
(Repeat no. 5)
On 5 February 2015 at 17:28, Erik Zachte ezachte@wikimedia.org wrote:
I'm not sure why a beacon would have to be a dummy html file, thus confusing PV stats.
Could it not be a dummy image request, more in line with the one pixel images that are often used.
This way Oliver can relax, go on vacation for real, without keeping a close watch over PV definitions.
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dan Andreescu Sent: Thursday, February 05, 2015 22:43
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
Nuria & Erik: you're totally right, I keep forgetting this problem is more complicated than I think.
So we should figure out how this statsv magic thing works and see if we can use it here.
On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz nuria@wikimedia.org wrote:
[Oliver] My point was more that we should try to avoid traffic-generating
[Oliver] requests that exist solely as a hack for analytics purposes;
[Dan] Is this a potential solution to Oliver's concern:
I disagree we should be concern about "beacons" to identify preloads, just like beacons exist for ads or stats using one to identify preloads doesn't seem far fetched (certainly I have used similar code before and it did its job).
Note that EL works in a similar fashion requesting a "fake" image to varnish to which we answer with a 204. It is very similar and the reason why we have such a code is that we do not have a specific endpoint or domain where requests of this type could go. Everything requested by our users and ourselves ends up in varnish pretty much.
On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... .
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to log virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I'm not sure why a beacon would have to be a dummy html file, thus
confusing PV stats.
Could it not be a dummy image request, more in line with the one pixel
images that are often used.
Right. I agree a dummy image makes more sense.
On Thu, Feb 5, 2015 at 2:28 PM, Erik Zachte ezachte@wikimedia.org wrote:
I'm not sure why a beacon would have to be a dummy html file, thus confusing PV stats.
Could it not be a dummy image request, more in line with the one pixel images that are often used.
This way Oliver can relax, go on vacation for real, without keeping a close watch over PV definitions.
*From:* analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] *On Behalf Of *Dan Andreescu *Sent:* Thursday, February 05, 2015 22:43
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] Virtual file view hack for Media Viewer views
Nuria & Erik: you're totally right, I keep forgetting this problem is more complicated than I think.
So we should figure out how this statsv magic thing works and see if we can use it here.
On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz nuria@wikimedia.org wrote:
[Oliver] My point was more that we should try to avoid traffic-generating
[Oliver] requests that exist solely as a hack for analytics purposes;
[Dan] Is this a potential solution to Oliver's concern:
I disagree we should be concern about "beacons" to identify preloads, just like beacons exist for ads or stats using one to identify preloads doesn't seem far fetched (certainly I have used similar code before and it did its job).
Note that EL works in a similar fashion requesting a "fake" image to varnish to which we answer with a 204. It is very similar and the reason why we have such a code is that we do not have a specific endpoint or domain where requests of this type could go. Everything requested by our users and ourselves ends up in varnish pretty much.
On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Is this a potential solution to Oliver's concern:
For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS.
On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I created a card -- modify as desired:
-Toby
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we
can't
use the current requests. But your point is a good one -- we'll need to
add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
Ruiz
Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they
might
get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way to track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following
hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-
<real
image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to
log
virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I have to admit that I haven't read all of this rather lengthy thread, but why wouldn't we just track this with EventLogging? That would avoid all the pitfalls of other possible solutions: dealing with caching, creating bogus extra file requests, etc.
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we
can't
use the current requests. But your point is a good one -- we'll need to
add
this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org
wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria
Ruiz
Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who
has
an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer
views
We would add a rule to Vagrant to make sure it does not try to look
up
such requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they
might
get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org
wrote:
Hi all,
Erik Zachte is working on file view stats and is looking for a way
to
track Media Viewer image views (for which there is no 1:1 relation between server hits and actual image views); after some back and forth in https://phabricator.wikimedia.org/T86914 I proposed the following
hack:
whenever the javascript code in MediaViewer determines that an image view happened (e.g. an image has been displayed for a certain amount of time), it makes a request to a certain fake image, say
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
image name>/<size>px-thumbnail.<ext> . These hits can than be easily filtered from the varnish request logs and added to the normal requests. We would add a rule to Vagrant to make sure it does not try to look up such requests in Swift but returns a 404 immediately.
This would be a temporary workaround until there is a proper way to
log
virtual image views, such as EventLogging with a non-SQL backend.
Do you see any fundamental problem with this?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Bandwidth, I imagine? 25M events is a lot of events on top of the existing throughput.
On 5 February 2015 at 18:13, Ryan Kaldari rkaldari@wikimedia.org wrote:
I have to admit that I haven't read all of this rather lengthy thread, but why wouldn't we just track this with EventLogging? That would avoid all the pitfalls of other possible solutions: dealing with caching, creating bogus extra file requests, etc.
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to the GLAM folks, so yes, I think this needs to be added eventually. Let's leave it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need to add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table (presumably they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote:
Oliver, this is not about pageviews, but about media file views.
These will be collected and dumped separately, as per
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count... .
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
>We would add a rule to Vagrant to make sure it does not try to look > up > such > requests in Swift but returns a 404 immediately.
I bet ops would like it a lot better if this is a 204 and it kind of makes sense as it is the code used for beacons and such. Otherwise they might get alarms on 404s increasing.
On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not really; the new pageviews definition wouldn't include those files anyway. It seems silly, thought, be deliberately generating a large amount of automated noise and client requests for this :/.
On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org wrote: > Hi all, > > Erik Zachte is working on file view stats and is looking for a way > to > track > Media Viewer image views (for which there is no 1:1 relation > between > server > hits and actual image views); after some back and forth in > https://phabricator.wikimedia.org/T86914 I proposed the following > hack: > > whenever the javascript code in MediaViewer determines that an > image > view > happened (e.g. an image has been displayed for a certain amount of > time), > it > makes a request to a certain fake image, say > > > upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real > image name>/<size>px-thumbnail.<ext> . These hits can than be > easily > filtered from the varnish request logs and added to the normal > requests. > We > would add a rule to Vagrant to make sure it does not try to look up > such > requests in Swift but returns a 404 immediately. > > This would be a temporary workaround until there is a proper way to > log > virtual image views, such as EventLogging with a non-SQL backend. > > Do you see any fundamental problem with this? >
> _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I have to admit that I haven't read all of this rather lengthy thread, but
why wouldn't we just track this with EventLogging? I think a good usage of event logging is tracking "events", not pageviews. We do not need a capsule+ schema+ validation system to be able to count pageviews. Plain requests would work fine, is a lot simpler use case.
On Thu, Feb 5, 2015 at 3:16 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Bandwidth, I imagine? 25M events is a lot of events on top of the existing throughput.
On 5 February 2015 at 18:13, Ryan Kaldari rkaldari@wikimedia.org wrote:
I have to admit that I haven't read all of this rather lengthy thread,
but
why wouldn't we just track this with EventLogging? That would avoid all
the
pitfalls of other possible solutions: dealing with caching, creating
bogus
extra file requests, etc.
On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin tnegrin@wikimedia.org
wrote:
It turns out that the media viewer (on desktop; don't know about mobile) does a lot of caching so just because an image is loaded from swift, it doesn't mean it is viewed. We'd like to provide more accurate stats to
the
GLAM folks, so yes, I think this needs to be added eventually. Let's
leave
it out of scope for now.
-Toby
On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
We want to include these files in the pageview definition? :/.
My point was more that we should try to avoid traffic-generating requests that exist solely as a hack for analytics purposes; it's artificial work for both users and us. If this is the only way of doing things that's totally fine.
On 5 February 2015 at 11:38, Toby Negrin tnegrin@wikimedia.org
wrote:
Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop
based
solution would be basically doing the same thing as you propose.
Can you please run it past ops (especially the 404 v 204) part?
Oliver -- the issue is that we'd like to figure out a way to provide accurate views of the media files; because of client side caching, we can't use the current requests. But your point is a good one -- we'll need
to
add this to the PV definition.
-Toby
On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes okeyes@wikimedia.org wrote:
A nice theory, but if they appear in the webrequest table
(presumably
they would, and we're not creating an entirely new set of varnishes for the transmission of dummy images?) they have to be factored in. Again, however, the new definition automatically filters them by checking the webrequest source and MIME type, so this is not a problem, as I originally stated.
On 5 February 2015 at 08:10, Erik Zachte ezachte@wikimedia.org wrote: > Oliver, this is not about pageviews, but about media file views. > > > > These will be collected and dumped separately, as per > > >
https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_count...
> . > > > > Erik > > > > > > From: analytics-bounces@lists.wikimedia.org > [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Nuria > Ruiz > Sent: Wednesday, February 04, 2015 22:28 > To: A mailing list for the Analytics Team at WMF and everybody who > has > an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] Virtual file view hack for Media Viewer > views > > > >>We would add a rule to Vagrant to make sure it does not try to
look
>> up >> such >> requests in Swift but returns a 404 immediately. > > I bet ops would like it a lot better if this is a 204 and it kind
of
> makes > sense as it is the code used for beacons and such. Otherwise they > might > get > alarms on 404s increasing. > > > > > > > > > > > > > > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
okeyes@wikimedia.org>
> wrote: > > Not really; the new pageviews definition wouldn't include those > files > anyway. It seems silly, thought, be deliberately generating a
large
> amount of automated noise and client requests for this :/. > > > On 4 February 2015 at 15:00, Gergo Tisza gtisza@wikimedia.org > wrote: >> Hi all, >> >> Erik Zachte is working on file view stats and is looking for a
way
>> to >> track >> Media Viewer image views (for which there is no 1:1 relation >> between >> server >> hits and actual image views); after some back and forth in >> https://phabricator.wikimedia.org/T86914 I proposed the
following
>> hack: >> >> whenever the javascript code in MediaViewer determines that an >> image >> view >> happened (e.g. an image has been displayed for a certain amount
of
>> time), >> it >> makes a request to a certain fake image, say >> >> >>
upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
>> image name>/<size>px-thumbnail.<ext> . These hits can than be >> easily >> filtered from the varnish request logs and added to the normal >> requests. >> We >> would add a rule to Vagrant to make sure it does not try to look
up
>> such >> requests in Swift but returns a 404 immediately. >> >> This would be a temporary workaround until there is a proper way
to
>> log >> virtual image views, such as EventLogging with a non-SQL backend. >> >> Do you see any fundamental problem with this? >> > >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Gergo Tisza, 04/02/2015 21:00:
Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
A dummy image request seems rather reasonable. (I assume varnish can
handle such load of "atypical" requests.) Right, the filtering for beacons is already in place in vcl and responses are sent right away so as far as I know there is no better place than varnish for this code. See example: https://github.com/wikimedia/operations-puppet/blob/production/templates/var...
On Thu, Feb 5, 2015 at 10:38 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gergo Tisza, 04/02/2015 21:00:
Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Our solution for this is now live.
Here's an example of a media beacon hit:
http://bits.wikimedia.org/beacon/media?duration=3709&uri=http%3A%2F%2Fup...
Beta is currently hitting that endpoint and production wikis will start doing the same once they start running 1.25wmf22
All views coming from Media Viewer will be hitting that endpoint. Note that there might be some loss of hits on browsers that don't support sendBeacon, since our fallback is a simple async AJAX request (we haven't tried to go beyond that with local storage and replaying the event, etc.) and this event might be fired in situations of tab/browser close as well as navigating away from the page. Thus keep in mind that a steady small increase of those hits over a long period of time might simply be the natural process of people upgrading their browsers to more modern ones.
On Fri, Feb 6, 2015 at 4:46 PM, Nuria Ruiz nuria@wikimedia.org wrote:
A dummy image request seems rather reasonable. (I assume varnish can
handle such load of "atypical" requests.) Right, the filtering for beacons is already in place in vcl and responses are sent right away so as far as I know there is no better place than varnish for this code. See example:
https://github.com/wikimedia/operations-puppet/blob/production/templates/var...
On Thu, Feb 5, 2015 at 10:38 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gergo Tisza, 04/02/2015 21:00:
Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Gilles:
And we know this data is coming via varnishkafka into the cluster, right? Did we checked that?
Thanks,
Nuria
On Wed, Mar 18, 2015 at 2:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
Our solution for this is now live.
Here's an example of a media beacon hit:
http://bits.wikimedia.org/beacon/media?duration=3709&uri=http%3A%2F%2Fup...
Beta is currently hitting that endpoint and production wikis will start doing the same once they start running 1.25wmf22
All views coming from Media Viewer will be hitting that endpoint. Note that there might be some loss of hits on browsers that don't support sendBeacon, since our fallback is a simple async AJAX request (we haven't tried to go beyond that with local storage and replaying the event, etc.) and this event might be fired in situations of tab/browser close as well as navigating away from the page. Thus keep in mind that a steady small increase of those hits over a long period of time might simply be the natural process of people upgrading their browsers to more modern ones.
On Fri, Feb 6, 2015 at 4:46 PM, Nuria Ruiz nuria@wikimedia.org wrote:
A dummy image request seems rather reasonable. (I assume varnish can
handle such load of "atypical" requests.) Right, the filtering for beacons is already in place in vcl and responses are sent right away so as far as I know there is no better place than varnish for this code. See example:
https://github.com/wikimedia/operations-puppet/blob/production/templates/var...
On Thu, Feb 5, 2015 at 10:38 PM, Federico Leva (Nemo) <nemowiki@gmail.com
wrote:
Gergo Tisza, 04/02/2015 21:00:
Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Nuria,
As far as I'm aware, it only goes to the varnish logs at the moment. What happens after that hasn't been configured and is starting to be way beyond the scope of what the Multimedia team should be involved with.
CCing Ori who knows whether the hits to the beacon URI are picked up by varnishkafka or not at the moment, since he set up the endpoint.
On Wed, Mar 18, 2015 at 3:42 PM, Nuria Ruiz nuria@wikimedia.org wrote:
Gilles:
And we know this data is coming via varnishkafka into the cluster, right? Did we checked that?
Thanks,
Nuria
On Wed, Mar 18, 2015 at 2:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
Our solution for this is now live.
Here's an example of a media beacon hit:
http://bits.wikimedia.org/beacon/media?duration=3709&uri=http%3A%2F%2Fup...
Beta is currently hitting that endpoint and production wikis will start doing the same once they start running 1.25wmf22
All views coming from Media Viewer will be hitting that endpoint. Note that there might be some loss of hits on browsers that don't support sendBeacon, since our fallback is a simple async AJAX request (we haven't tried to go beyond that with local storage and replaying the event, etc.) and this event might be fired in situations of tab/browser close as well as navigating away from the page. Thus keep in mind that a steady small increase of those hits over a long period of time might simply be the natural process of people upgrading their browsers to more modern ones.
On Fri, Feb 6, 2015 at 4:46 PM, Nuria Ruiz nuria@wikimedia.org wrote:
A dummy image request seems rather reasonable. (I assume varnish can
handle such load of "atypical" requests.) Right, the filtering for beacons is already in place in vcl and responses are sent right away so as far as I know there is no better place than varnish for this code. See example:
https://github.com/wikimedia/operations-puppet/blob/production/templates/var...
On Thu, Feb 5, 2015 at 10:38 PM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Gergo Tisza, 04/02/2015 21:00:
Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Any webrequest is picked up by varnishkafka and goes into the webrequest logs in the cluster. If you want special treatment of your request, say different formatting or different logs, we’ll have to do something else :)
On Mar 19, 2015, at 04:45, Gilles Dubuc gilles@wikimedia.org wrote:
Hi Nuria,
As far as I'm aware, it only goes to the varnish logs at the moment. What happens after that hasn't been configured and is starting to be way beyond the scope of what the Multimedia team should be involved with.
CCing Ori who knows whether the hits to the beacon URI are picked up by varnishkafka or not at the moment, since he set up the endpoint.
On Wed, Mar 18, 2015 at 3:42 PM, Nuria Ruiz <nuria@wikimedia.org mailto:nuria@wikimedia.org> wrote: Gilles:
And we know this data is coming via varnishkafka into the cluster, right? Did we checked that?
Thanks,
Nuria
On Wed, Mar 18, 2015 at 2:02 AM, Gilles Dubuc <gilles@wikimedia.org mailto:gilles@wikimedia.org> wrote: Our solution for this is now live.
Here's an example of a media beacon hit:
http://bits.wikimedia.org/beacon/media?duration=3709&uri=http%3A%2F%2Fup... http://bits.wikimedia.org/beacon/media?duration=3709&uri=http%3A%2F%2Fupload.beta.wmflabs.org%2Fwikipedia%2Fen%2Fthumb%2Fb%2Fb0%2FSunrise_over_fishing_boats_in_Kerala.jpg%2F640px-Sunrise_over_fishing_boats_in_Kerala.jpg
Beta is currently hitting that endpoint and production wikis will start doing the same once they start running 1.25wmf22
All views coming from Media Viewer will be hitting that endpoint. Note that there might be some loss of hits on browsers that don't support sendBeacon, since our fallback is a simple async AJAX request (we haven't tried to go beyond that with local storage and replaying the event, etc.) and this event might be fired in situations of tab/browser close as well as navigating away from the page. Thus keep in mind that a steady small increase of those hits over a long period of time might simply be the natural process of people upgrading their browsers to more modern ones.
On Fri, Feb 6, 2015 at 4:46 PM, Nuria Ruiz <nuria@wikimedia.org mailto:nuria@wikimedia.org> wrote:
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.)
Right, the filtering for beacons is already in place in vcl and responses are sent right away so as far as I know there is no better place than varnish for this code. See example: https://github.com/wikimedia/operations-puppet/blob/production/templates/var... https://github.com/wikimedia/operations-puppet/blob/production/templates/varnish/bits.inc.vcl.erb#L24
On Thu, Feb 5, 2015 at 10:38 PM, Federico Leva (Nemo) <nemowiki@gmail.com mailto:nemowiki@gmail.com> wrote: Gergo Tisza, 04/02/2015 21:00: Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
What happens after that hasn't been configured and is starting to be way
beyond the scope of what the Multimedia team should be involved with. True, but we need to know on our end whether some special treatment needs to be applied to this data.
For example, if these have to be counted as pageviews we should know. By Erik Z's reply early on we assume these are not to be counted as pageviews but rather "media files views" so they will "sit" on the refined tables with is_Pageview=0 and some code will harvest those to count them as "media file request counts".
Thanks,
Nuria
On Thu, Mar 19, 2015 at 6:35 AM, Andrew Otto aotto@wikimedia.org wrote:
Any webrequest is picked up by varnishkafka and goes into the webrequest logs in the cluster. If you want special treatment of your request, say different formatting or different logs, we’ll have to do something else :)
On Mar 19, 2015, at 04:45, Gilles Dubuc gilles@wikimedia.org wrote:
Hi Nuria,
As far as I'm aware, it only goes to the varnish logs at the moment. What happens after that hasn't been configured and is starting to be way beyond the scope of what the Multimedia team should be involved with.
CCing Ori who knows whether the hits to the beacon URI are picked up by varnishkafka or not at the moment, since he set up the endpoint.
On Wed, Mar 18, 2015 at 3:42 PM, Nuria Ruiz nuria@wikimedia.org wrote:
Gilles:
And we know this data is coming via varnishkafka into the cluster, right? Did we checked that?
Thanks,
Nuria
On Wed, Mar 18, 2015 at 2:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
Our solution for this is now live.
Here's an example of a media beacon hit:
http://bits.wikimedia.org/beacon/media?duration=3709&uri=http%3A%2F%2Fup...
Beta is currently hitting that endpoint and production wikis will start doing the same once they start running 1.25wmf22
All views coming from Media Viewer will be hitting that endpoint. Note that there might be some loss of hits on browsers that don't support sendBeacon, since our fallback is a simple async AJAX request (we haven't tried to go beyond that with local storage and replaying the event, etc.) and this event might be fired in situations of tab/browser close as well as navigating away from the page. Thus keep in mind that a steady small increase of those hits over a long period of time might simply be the natural process of people upgrading their browsers to more modern ones.
On Fri, Feb 6, 2015 at 4:46 PM, Nuria Ruiz nuria@wikimedia.org wrote:
A dummy image request seems rather reasonable. (I assume varnish can
handle such load of "atypical" requests.) Right, the filtering for beacons is already in place in vcl and responses are sent right away so as far as I know there is no better place than varnish for this code. See example:
https://github.com/wikimedia/operations-puppet/blob/production/templates/var...
On Thu, Feb 5, 2015 at 10:38 PM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Gergo Tisza, 04/02/2015 21:00:
Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hey Gilles,
Thanks much for getting this done. We hope to announce the media file requests dump any moment now.
The initial release will be without this new data, but I hope we can incorporate it asap.
Cheers,
Erik
From: Nuria Ruiz [mailto:nuria@wikimedia.org] Sent: Thursday, March 19, 2015 17:47 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Cc: Erik Zachte Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
What happens after that hasn't been configured and is starting to be way beyond the scope of what the Multimedia team should be involved with.
True, but we need to know on our end whether some special treatment needs to be applied to this data.
For example, if these have to be counted as pageviews we should know. By Erik Z's reply early on we assume these are not to be counted as pageviews but rather "media files views" so they will "sit" on the refined tables with is_Pageview=0 and some code will harvest those to count them as "media file request counts".
Thanks,
Nuria
On Thu, Mar 19, 2015 at 6:35 AM, Andrew Otto aotto@wikimedia.org wrote:
Any webrequest is picked up by varnishkafka and goes into the webrequest logs in the cluster. If you want special treatment of your request, say different formatting or different logs, we’ll have to do something else :)
On Mar 19, 2015, at 04:45, Gilles Dubuc gilles@wikimedia.org wrote:
Hi Nuria,
As far as I'm aware, it only goes to the varnish logs at the moment. What happens after that hasn't been configured and is starting to be way beyond the scope of what the Multimedia team should be involved with.
CCing Ori who knows whether the hits to the beacon URI are picked up by varnishkafka or not at the moment, since he set up the endpoint.
On Wed, Mar 18, 2015 at 3:42 PM, Nuria Ruiz nuria@wikimedia.org wrote:
Gilles:
And we know this data is coming via varnishkafka into the cluster, right? Did we checked that?
Thanks,
Nuria
On Wed, Mar 18, 2015 at 2:02 AM, Gilles Dubuc gilles@wikimedia.org wrote:
Our solution for this is now live.
Here's an example of a media beacon hit:
http://bits.wikimedia.org/beacon/media?duration=3709 http://bits.wikimedia.org/beacon/media?duration=3709&uri=http%3A%2F%2Fupload.beta.wmflabs.org%2Fwikipedia%2Fen%2Fthumb%2Fb%2Fb0%2FSunrise_over_fishing_boats_in_Kerala.jpg%2F640px-Sunrise_over_fishing_boats_in_Kerala.jpg &uri=http%3A%2F%2Fupload.beta.wmflabs.org%2Fwikipedia%2Fen%2Fthumb%2Fb%2Fb0%2FSunrise_over_fishing_boats_in_Kerala.jpg%2F640px-Sunrise_over_fishing_boats_in_Kerala.jpg
Beta is currently hitting that endpoint and production wikis will start doing the same once they start running 1.25wmf22
All views coming from Media Viewer will be hitting that endpoint. Note that there might be some loss of hits on browsers that don't support sendBeacon, since our fallback is a simple async AJAX request (we haven't tried to go beyond that with local storage and replaying the event, etc.) and this event might be fired in situations of tab/browser close as well as navigating away from the page. Thus keep in mind that a steady small increase of those hits over a long period of time might simply be the natural process of people upgrading their browsers to more modern ones.
On Fri, Feb 6, 2015 at 4:46 PM, Nuria Ruiz nuria@wikimedia.org wrote:
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.)
Right, the filtering for beacons is already in place in vcl and responses are sent right away so as far as I know there is no better place than varnish for this code. See example:
https://github.com/wikimedia/operations-puppet/blob/production/templates/var...
On Thu, Feb 5, 2015 at 10:38 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gergo Tisza, 04/02/2015 21:00:
Do you see any fundamental problem with this?
A dummy image request seems rather reasonable. (I assume varnish can handle such load of "atypical" requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds requests, so the effect looks negligible.
Nemo
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics