Hi Willy, (Forwarding your question to the public analytics list for others who might know more.)
Do you have any data that shows how many times audio files were
downloaded in 2022?
I think your best bet is the Mediacounts dataset https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts, which is available in a public API https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests. E.g., to get # requested of audio downloads in 2022: https://wikimedia.org/api/rest_v1/metrics/mediarequests/aggregate/all-refere...
However, it doesn't look like data transfer details are available in the Public API. The backing dataset in Hive does have a total_response_size field so you could probably get this info more specifically by querying for it in Hive.
Good luck!
On Wed, Feb 1, 2023 at 7:11 PM Willy Pao wpao@wikimedia.org wrote:
Hey Andrew - hope all is going well. I've been working on gathering some data for Wikimedia's Annual Sustainability Report, and there was a question that Deb sent over regarding the usage of Audio files. With Jaime's help from Data Persistence SRE, we were able to figure out some of the numbers around storage and energy consumption. There was one part I was hoping you (or someone from your team) might be able to help with though. Do you have any data that shows how many times audio files were downloaded in 2022? Much appreciated in advance.
Thanks, Willy
---------- Forwarded message --------- From: Deb Tankersley dtankersley@wikimedia.org Date: Mon, Jan 30, 2023 at 1:41 PM Subject: energy used to store To: Willy Pao wpao@wikimedia.org, Erin Morris emorris@wikimedia.org, Cassie Casares ccasares@wikimedia.org
Hey Willy!
I got an interesting question (bolded below) from Wikimedia Sweden on the energy that we use to store and serve audio files. Here's their full comment / question:
*"As part of my yearly planning for 2023, we are conducting a study
regarding digitization of audio tapes, which climate footprints the various stages in the process generate and whether some of these can be made more energy efficient. We have limited the study to audio tapes, because it is a prioritized material category and a very data-intensive business, and because the limitation hopefully gives us relatively accurate numbers. Since we have been publishing digital audio originally from audio tapes on Wikimedia Commons for the past few years, I was wondering if there are any statistics related to energy consumption and carbon dioxide emissions available?*
*What we would like to know is how much energy is required in the year 2022 to store our total amount of uploaded audio files (with the exception of Karl Tirén's phonograph recordings), how many times they have been downloaded and how large a total amount of data is involved. We suspect that downloading the high-resolution audio files is also relatively data intensive. As mentioned, the goal is not to stop this activity, or even reduce it without seeing how it looks and then investigating whether there are any links in the chain that can be tweaked to possibly reduce the climate impact. If numbers cannot be obtained, this is also valuable information."*
I'm not sure if we can narrow down this enough to get them a decent / solid answer. What are your thoughts?
Thanks,
Deb
--
deb tankersley (she/her)
senior program manager, engineering
Wikimedia Foundation
Yep, so that's the best data I know of as well. The table that backs the public API is documented here https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,wmf.mediarequest,PROD)/Schema?is_lineage_mode=false&schemaFilter=. And we have a visualization of this in Wikistats, where you can filter down to just audio files https://stats.wikimedia.org/#/all-projects/content/total-mediarequests/normal|bar|2-year|media_type~audio|monthly .
Happy to help slice and dice through the data, you can post questions here or ping me.
On Thu, Feb 2, 2023 at 1:25 PM Andrew Otto otto@wikimedia.org wrote:
Hi Willy, (Forwarding your question to the public analytics list for others who might know more.)
Do you have any data that shows how many times audio files were
downloaded in 2022?
I think your best bet is the Mediacounts dataset https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts, which is available in a public API https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests. E.g., to get # requested of audio downloads in 2022:
https://wikimedia.org/api/rest_v1/metrics/mediarequests/aggregate/all-refere...
However, it doesn't look like data transfer details are available in the Public API. The backing dataset in Hive does have a total_response_size field so you could probably get this info more specifically by querying for it in Hive.
Good luck!
On Wed, Feb 1, 2023 at 7:11 PM Willy Pao wpao@wikimedia.org wrote:
Hey Andrew - hope all is going well. I've been working on gathering some data for Wikimedia's Annual Sustainability Report, and there was a question that Deb sent over regarding the usage of Audio files. With Jaime's help from Data Persistence SRE, we were able to figure out some of the numbers around storage and energy consumption. There was one part I was hoping you (or someone from your team) might be able to help with though. Do you have any data that shows how many times audio files were downloaded in 2022? Much appreciated in advance.
Thanks, Willy
---------- Forwarded message --------- From: Deb Tankersley dtankersley@wikimedia.org Date: Mon, Jan 30, 2023 at 1:41 PM Subject: energy used to store To: Willy Pao wpao@wikimedia.org, Erin Morris emorris@wikimedia.org, Cassie Casares ccasares@wikimedia.org
Hey Willy!
I got an interesting question (bolded below) from Wikimedia Sweden on the energy that we use to store and serve audio files. Here's their full comment / question:
*"As part of my yearly planning for 2023, we are conducting a study
regarding digitization of audio tapes, which climate footprints the various stages in the process generate and whether some of these can be made more energy efficient. We have limited the study to audio tapes, because it is a prioritized material category and a very data-intensive business, and because the limitation hopefully gives us relatively accurate numbers. Since we have been publishing digital audio originally from audio tapes on Wikimedia Commons for the past few years, I was wondering if there are any statistics related to energy consumption and carbon dioxide emissions available?*
*What we would like to know is how much energy is required in the year 2022 to store our total amount of uploaded audio files (with the exception of Karl Tirén's phonograph recordings), how many times they have been downloaded and how large a total amount of data is involved. We suspect that downloading the high-resolution audio files is also relatively data intensive. As mentioned, the goal is not to stop this activity, or even reduce it without seeing how it looks and then investigating whether there are any links in the chain that can be tweaked to possibly reduce the climate impact. If numbers cannot be obtained, this is also valuable information."*
I'm not sure if we can narrow down this enough to get them a decent / solid answer. What are your thoughts?
Thanks,
Deb
--
deb tankersley (she/her)
senior program manager, engineering
Wikimedia Foundation
Analytics mailing list -- analytics@lists.wikimedia.org To unsubscribe send an email to analytics-leave@lists.wikimedia.org
Hi Andrew and Dan - thanks so much for the quick reply. This was super helpful, and we'll be sure to pass the info along to the folks requesting the data.
Thanks, Willy
On Thu, Feb 2, 2023 at 10:39 AM Dan Andreescu dandreescu@wikimedia.org wrote:
Yep, so that's the best data I know of as well. The table that backs the public API is documented here https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,wmf.mediarequest,PROD)/Schema?is_lineage_mode=false&schemaFilter=. And we have a visualization of this in Wikistats, where you can filter down to just audio files https://stats.wikimedia.org/#/all-projects/content/total-mediarequests/normal%7Cbar%7C2-year%7Cmedia_type~audio%7Cmonthly .
Happy to help slice and dice through the data, you can post questions here or ping me.
On Thu, Feb 2, 2023 at 1:25 PM Andrew Otto otto@wikimedia.org wrote:
Hi Willy, (Forwarding your question to the public analytics list for others who might know more.)
Do you have any data that shows how many times audio files were
downloaded in 2022?
I think your best bet is the Mediacounts dataset https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts, which is available in a public API https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests. E.g., to get # requested of audio downloads in 2022:
https://wikimedia.org/api/rest_v1/metrics/mediarequests/aggregate/all-refere...
However, it doesn't look like data transfer details are available in the Public API. The backing dataset in Hive does have a total_response_size field so you could probably get this info more specifically by querying for it in Hive.
Good luck!
On Wed, Feb 1, 2023 at 7:11 PM Willy Pao wpao@wikimedia.org wrote:
Hey Andrew - hope all is going well. I've been working on gathering some data for Wikimedia's Annual Sustainability Report, and there was a question that Deb sent over regarding the usage of Audio files. With Jaime's help from Data Persistence SRE, we were able to figure out some of the numbers around storage and energy consumption. There was one part I was hoping you (or someone from your team) might be able to help with though. Do you have any data that shows how many times audio files were downloaded in 2022? Much appreciated in advance.
Thanks, Willy
---------- Forwarded message --------- From: Deb Tankersley dtankersley@wikimedia.org Date: Mon, Jan 30, 2023 at 1:41 PM Subject: energy used to store To: Willy Pao wpao@wikimedia.org, Erin Morris emorris@wikimedia.org, Cassie Casares ccasares@wikimedia.org
Hey Willy!
I got an interesting question (bolded below) from Wikimedia Sweden on the energy that we use to store and serve audio files. Here's their full comment / question:
*"As part of my yearly planning for 2023, we are conducting a study
regarding digitization of audio tapes, which climate footprints the various stages in the process generate and whether some of these can be made more energy efficient. We have limited the study to audio tapes, because it is a prioritized material category and a very data-intensive business, and because the limitation hopefully gives us relatively accurate numbers. Since we have been publishing digital audio originally from audio tapes on Wikimedia Commons for the past few years, I was wondering if there are any statistics related to energy consumption and carbon dioxide emissions available?*
*What we would like to know is how much energy is required in the year 2022 to store our total amount of uploaded audio files (with the exception of Karl Tirén's phonograph recordings), how many times they have been downloaded and how large a total amount of data is involved. We suspect that downloading the high-resolution audio files is also relatively data intensive. As mentioned, the goal is not to stop this activity, or even reduce it without seeing how it looks and then investigating whether there are any links in the chain that can be tweaked to possibly reduce the climate impact. If numbers cannot be obtained, this is also valuable information."*
I'm not sure if we can narrow down this enough to get them a decent / solid answer. What are your thoughts?
Thanks,
Deb
--
deb tankersley (she/her)
senior program manager, engineering
Wikimedia Foundation
Analytics mailing list -- analytics@lists.wikimedia.org To unsubscribe send an email to analytics-leave@lists.wikimedia.org