Analytics February 2023

analytics@lists.wikimedia.org

6 participants
3 discussions

API Outages
by Joshua Haecker 03 Mar '23

03 Mar '23

Hi all, Just curious if there is a known cause for the multiple long delays we've had on the AQS API data being available this week? I know periodic delays are not uncommon but these seem beyond normal levels. Thanks! ~Josh

4 6

[Wikimedia Research Showcase] February 15 at 9:30AM PT, 17:30 UTC
by Emily Lescak 15 Feb '23

15 Feb '23

Hello everyone, The next Research Showcase will be livestreamed next Wednesday, February 15 at 9:30AM PT / 17:30 UTC. The theme is The Free Knowledge Ecosystem. YouTube stream: https://www.youtube.com/watch?v=8VJmR-3lTac We welcome you to join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase This month's presentations: The evolution of humanitarian mapping in OpenStreetMap (OSM) and how it affects map completeness and inequalities in OSMBy *Benjamin Herfort, Heidelberg Institute for Geoinformation Technology*Mapping efforts of communities in OpenStreetMap (OSM) over the previous decade have created a unique global geographic database, which is accessible to all with no licensing costs. The collaborative maps of OSM have been used to support humanitarian efforts around the world as well as to fill important data gaps for implementing major development frameworks such as the Sustainable Development Goals (SDGs). Besides the well-examined Global North - Global South bias in OSM, the OSM data as of 2023 shows a much more spatially diverse spread pattern than previously considered, which was shaped by regional, socio-economic and demographic factors across several scales. Humanitarian mapping efforts of the previous decade have already made OSM more inclusive, contributing to diversify and expand the spatial footprint of the areas mapped. However, methods to quantify and account for the remaining biases in OSM’s coverage are needed so that researchers and practitioners will be able to draw the right conclusions, e .g. about progress towards the SDGs in cities. Dataset reuseː Toward translating principles to practiceBy *Laura Koesten, University of Vienna*The web provides access to millions of datasets. These data can have additional impact when used beyond the context for which they were originally created. But using a dataset beyond the context in which it originated remains challenging. Simply making data available does not mean it will be or can be easily used by others. At the same time, we have little empirical insight into what makes a dataset reusable and which of the existing guidelines and frameworks have an impact.In this talk, I will discuss our research on what makes data reusable in practice. This is informed by a synthesis of literature on the topic, our studies on how people evaluate and make sense of data, and a case study on datasets on GitHub. In the case study, we describe a corpus of more than 1.4 million data files from over 65,000 repositories. Building on reuse features from the literature, we use GitHub’s engagement metrics as proxies for dataset reuse and devise an initial model, using deep neural networks, to predict a dataset’s reusability. This demonstrates the practical gap between principles and actionable insights that might allow data publishers and tool designers to implement functionalities that facilitate reuse. We hope you can join us! Warm regards, Emily -- Emily Lescak (she / her) Senior Research Community Officer The Wikimedia Foundation

1 1

Re: energy used to store
by Andrew Otto 02 Feb '23

02 Feb '23

Hi Willy, (Forwarding your question to the public analytics list for others who might know more.) > Do you have any data that shows how many times audio files were downloaded in 2022? I think your best bet is the Mediacounts dataset <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts>, which is available in a public API <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests>. E.g., to get # requested of audio downloads in 2022: https://wikimedia.org/api/rest_v1/metrics/mediarequests/aggregate/all-refer… However, it doesn't look like data transfer details are available in the Public API. The backing dataset in Hive does have a total_response_size field so you could probably get this info more specifically by querying for it in Hive. Good luck! On Wed, Feb 1, 2023 at 7:11 PM Willy Pao <wpao(a)wikimedia.org> wrote: > Hey Andrew - hope all is going well. I've been working on gathering some > data for Wikimedia's Annual Sustainability Report, and there was a question > that Deb sent over regarding the usage of Audio files. With Jaime's help > from Data Persistence SRE, we were able to figure out some of the numbers > around storage and energy consumption. There was one part I was hoping you > (or someone from your team) might be able to help with though. Do you have > any data that shows how many times audio files were downloaded in 2022? > Much appreciated in advance. > > Thanks, > Willy > > ---------- Forwarded message --------- > From: Deb Tankersley <dtankersley(a)wikimedia.org> > Date: Mon, Jan 30, 2023 at 1:41 PM > Subject: energy used to store > To: Willy Pao <wpao(a)wikimedia.org>, Erin Morris <emorris(a)wikimedia.org>, > Cassie Casares <ccasares(a)wikimedia.org> > > > Hey Willy! > > I got an interesting question (bolded below) from Wikimedia Sweden on the > energy that we use to store and serve audio files. Here's their full > comment / question: > > *"As part of my yearly planning for 2023, we are conducting a study >> regarding digitization of audio tapes, which climate footprints the various >> stages in the process generate and whether some of these can be made more >> energy efficient. We have limited the study to audio tapes, because it is a >> prioritized material category and a very data-intensive business, and >> because the limitation hopefully gives us relatively accurate numbers. >> Since we have been publishing digital audio originally from audio tapes on >> Wikimedia Commons for the past few years, I was wondering if there are any >> statistics related to energy consumption and carbon dioxide emissions >> available?* >> >> >> *What we would like to know is how much energy is required in the year >> 2022 to store our total amount of uploaded audio files (with the exception >> of Karl Tirén's phonograph recordings), how many times they have been >> downloaded and how large a total amount of data is involved. We suspect >> that downloading the high-resolution audio files is also relatively data >> intensive. As mentioned, the goal is not to stop this activity, or even >> reduce it without seeing how it looks and then investigating whether there >> are any links in the chain that can be tweaked to possibly reduce the >> climate impact. If numbers cannot be obtained, this is also valuable >> information."* >> > > > I'm not sure if we can narrow down this enough to get them a decent / > solid answer. What are your thoughts? > > > Thanks, > > > Deb > > -- > > deb tankersley (she/her) > > senior program manager, engineering > > Wikimedia Foundation > > > > >

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics February 2023