Hi Analytics,
How do I determine how many times this video https://commons.wikimedia.org/wiki/File:Wikipedia_5_million_articles_milestone_video_November_2015.ogv has been played in the last 90 days?
Thanks,
Pine
Pine, right now you can either query Hive if you have access to the cluster, or you can download the days you're interested from here: http://dumps.wikimedia.org/other/mediacounts/daily/2015/ and crunch the numbers for the articles you're interested in (not too bad)
On Mon, Dec 14, 2015 at 5:01 PM, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
How do I determine how many times this video https://commons.wikimedia.org/wiki/File:Wikipedia_5_million_articles_milestone_video_November_2015.ogv has been played in the last 90 days?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Dan,
I have a Labs account which I've barely used. Is access to the cluster a separate step from having access to Labs?
Also, is there a "how to" guide somewhere for how to query the cluster?
Thanks, Pine
On Mon, Dec 14, 2015 at 2:11 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Pine, right now you can either query Hive if you have access to the cluster, or you can download the days you're interested from here: http://dumps.wikimedia.org/other/mediacounts/daily/2015/ and crunch the numbers for the articles you're interested in (not too bad)
On Mon, Dec 14, 2015 at 5:01 PM, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
How do I determine how many times this video https://commons.wikimedia.org/wiki/File:Wikipedia_5_million_articles_milestone_video_November_2015.ogv has been played in the last 90 days?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Pine,
Yes, you need stat1002 access to run Hive queries. It's not the same as Labs. There's plenty of documentation here, on how to request access, and how to query data - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive.
On Mon, Dec 14, 2015 at 2:42 PM, Pine W wiki.pine@gmail.com wrote:
Hi Dan,
I have a Labs account which I've barely used. Is access to the cluster a separate step from having access to Labs?
Also, is there a "how to" guide somewhere for how to query the cluster?
Thanks, Pine
On Mon, Dec 14, 2015 at 2:11 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Pine, right now you can either query Hive if you have access to the cluster, or you can download the days you're interested from here: http://dumps.wikimedia.org/other/mediacounts/daily/2015/ and crunch the numbers for the articles you're interested in (not too bad)
On Mon, Dec 14, 2015 at 5:01 PM, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
How do I determine how many times this video https://commons.wikimedia.org/wiki/File:Wikipedia_5_million_articles_milestone_video_November_2015.ogv has been played in the last 90 days?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I should caution that for idle questions I sincerely doubt cluster access will be given; there's no way of partitioning it so that you can't access, say, random readers' IP addresses ;p
On 14 December 2015 at 17:54, Madhumitha Viswanathan mviswanathan@wikimedia.org wrote:
Hi Pine,
Yes, you need stat1002 access to run Hive queries. It's not the same as Labs. There's plenty of documentation here, on how to request access, and how to query data - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive.
On Mon, Dec 14, 2015 at 2:42 PM, Pine W wiki.pine@gmail.com wrote:
Hi Dan,
I have a Labs account which I've barely used. Is access to the cluster a separate step from having access to Labs?
Also, is there a "how to" guide somewhere for how to query the cluster?
Thanks, Pine
On Mon, Dec 14, 2015 at 2:11 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Pine, right now you can either query Hive if you have access to the cluster, or you can download the days you're interested from here: http://dumps.wikimedia.org/other/mediacounts/daily/2015/ and crunch the numbers for the articles you're interested in (not too bad)
On Mon, Dec 14, 2015 at 5:01 PM, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
How do I determine how many times this video has been played in the last 90 days?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- --Madhu :)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I don't need, nor want, access to any data about unique readers/viewers. Is there a way of sanitizing the data? There is public data about pageviews so it seems to me that there should also be public data about media playback.
Alternatively, is there a way that I can file a request with someone who has the correct permissions to ask them to run the appropriate query?
Pine
On Mon, Dec 14, 2015 at 2:55 PM, Oliver Keyes okeyes@wikimedia.org wrote:
I should caution that for idle questions I sincerely doubt cluster access will be given; there's no way of partitioning it so that you can't access, say, random readers' IP addresses ;p
On 14 December 2015 at 17:54, Madhumitha Viswanathan mviswanathan@wikimedia.org wrote:
Hi Pine,
Yes, you need stat1002 access to run Hive queries. It's not the same as Labs. There's plenty of documentation here, on how to request access, and how to query data - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive.
On Mon, Dec 14, 2015 at 2:42 PM, Pine W wiki.pine@gmail.com wrote:
Hi Dan,
I have a Labs account which I've barely used. Is access to the cluster a separate step from having access to Labs?
Also, is there a "how to" guide somewhere for how to query the cluster?
Thanks, Pine
On Mon, Dec 14, 2015 at 2:11 PM, Dan Andreescu <
dandreescu@wikimedia.org>
wrote:
Pine, right now you can either query Hive if you have access to the cluster, or you can download the days you're interested from here: http://dumps.wikimedia.org/other/mediacounts/daily/2015/ and crunch
the
numbers for the articles you're interested in (not too bad)
On Mon, Dec 14, 2015 at 5:01 PM, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
How do I determine how many times this video has been played in the
last
90 days?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- --Madhu :)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
There's a task in our backlog to publish this data as part of the API - https://phabricator.wikimedia.org/T88775.
On Mon, Dec 14, 2015 at 2:58 PM, Pine W wiki.pine@gmail.com wrote:
I don't need, nor want, access to any data about unique readers/viewers. Is there a way of sanitizing the data? There is public data about pageviews so it seems to me that there should also be public data about media playback.
Alternatively, is there a way that I can file a request with someone who has the correct permissions to ask them to run the appropriate query?
Pine
On Mon, Dec 14, 2015 at 2:55 PM, Oliver Keyes okeyes@wikimedia.org wrote:
I should caution that for idle questions I sincerely doubt cluster access will be given; there's no way of partitioning it so that you can't access, say, random readers' IP addresses ;p
On 14 December 2015 at 17:54, Madhumitha Viswanathan mviswanathan@wikimedia.org wrote:
Hi Pine,
Yes, you need stat1002 access to run Hive queries. It's not the same as Labs. There's plenty of documentation here, on how to request access,
and
how to query data - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive.
On Mon, Dec 14, 2015 at 2:42 PM, Pine W wiki.pine@gmail.com wrote:
Hi Dan,
I have a Labs account which I've barely used. Is access to the cluster
a
separate step from having access to Labs?
Also, is there a "how to" guide somewhere for how to query the cluster?
Thanks, Pine
On Mon, Dec 14, 2015 at 2:11 PM, Dan Andreescu <
dandreescu@wikimedia.org>
wrote:
Pine, right now you can either query Hive if you have access to the cluster, or you can download the days you're interested from here: http://dumps.wikimedia.org/other/mediacounts/daily/2015/ and crunch
the
numbers for the articles you're interested in (not too bad)
On Mon, Dec 14, 2015 at 5:01 PM, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
How do I determine how many times this video has been played in the
last
90 days?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- --Madhu :)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I don't need, nor want, access to any data about unique readers/viewers. Is there a way of sanitizing the data? There is public data about pageviews so it seems to me that there should also be public data about media playback.
Alternatively, is there a way that I can file a request with someone who has the correct permissions to ask them to run the appropriate query?
The task Madhu pointed to is something we hope to get to soon, but we may find that we don't have enough hardware to host the extra data on the API. Are the dumps a bad solution for this problem, though? They're not too big in this case and not to hard to parse through for the small time period you're talking about. The format is pretty simple too, I think you should be able to download and do everything you need with grep and awk. Or python if that's easier.
Dan Andreescu, 15/12/2015 03:43:
Or python if that's easier.
https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py is very easy to use. Download from dumps.wikimedia.org is tragically slow, making any one-time analysis impractical, but /data/scratch/tmp/mediacounts on Labs has a copy of October data.
Nemo
Download from dumps.wikimedia.org is tragically slow, making any one-time analysis impractical, but /data/scratch/tmp/mediacounts on Labs has a copy of October data.
Nemo, that's really good information, thank you. I'm going to ask a hypothetical and I haven't done my due diligence yet. If we kept the last month of mediacounts data in the pageview API, would that be useful? That way we might be able to find the space and it won't grow in an unbounded way.
Making sure that I'm understanding this correctly: if I use https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py:
1. Does the data reflect views through the media players on Wikipedias and other non-Commons sites? 2. Does the data reflect the number of views *and* downloads in all image sizes and formats? 2. Is the transfer count information available indefinitely, or only for 90 days?
Pine