Analytics May 2022

analytics@lists.wikimedia.org

3 participants
4 discussions

[event] Wiki Workshop 2022 - Registration open
by Leila Zia 07 Jun '22

07 Jun '22

Hi all, The registration for Wiki Workshop 2022 [1] is now open. The event is virtually held on April 25, 12:00-18:30 UTC and as part of The Web Conference 2022 [2]. The plenary parts of the event will be recorded and shared publicly afterwards. Wiki Workshop is the largest Wikimedia research event of the year (so far;) that the Research team at the Wikimedia Foundation co-organizes with our Research Fellow, Bob West (EPFL). This year, Srijan Kumar (Georgia Tech) joined the organizing team as well.:) The event brings together scholars and researchers from across the world who are interested in or are actively engaged with research and development on the Wikimedia projects. While the details of the schedule are to be finalized and posted in the coming week, we expect to generally follow the format of 2021 [3]. This year we received research submissions from more than 20 countries and have accepted 27 research papers whose authors will present the work as part of the workshop (If you are an author of an accepted paper: congrats!:) . Our keynote speaker is Larry Lessig [4] and we will have a panel to reflect on the decade anniversary of SOPA/PIPA, moderated by Erik Moeller (Freedom of the Press). And of course, all the music, games, etc. will remain. :) If you are interested in participating in the live event, please indicate your interest by filling out [5]. Anyone is encouraged to register: you don't have to be a researcher. In the registration form, please explain why attending the live event will support you in your work on the Wikimedia projects and beyond. If you have questions, please don't hesitate to reach out. Best, Leila [1] https://wikiworkshop.org/2022/ [2] https://www2022.thewebconf.org/ [3] https://wikiworkshop.org/2021/#schedule [4] https://hls.harvard.edu/faculty/directory/10519/Lessig [5] (privacy statement for the Google form survey [6]) https://docs.google.com/forms/d/e/1FAIpQLSctlkUv8FasB2Nc4RvThnxAbjPzUwmnxB2… [6] https://foundation.wikimedia.org/wiki/Legal:Wiki_Workshop_Registration_Priv… -- Leila Zia Head of Research Wikimedia Foundation

1 1

Earlier access to Pageviews hourly raw data files
by Maxim Aparovich 16 May '22

16 May '22

Dear Sir or Madam, Writing to you with a question about Pageviews hourly raw data files <https://dumps.wikimedia.org/other/pageviews/readme.html>. First of all, let me know if I chose the right person for a question. If not, could you please advise to whom I should direct the question? The question is below. I am working on a project where we would like to use Pageviews hourly data <https://dumps.wikimedia.org/other/pageviews/readme.html>. For us, it is crucial to get data as soon as possible. As I can see on the web page, hourly data is available in the Wikimedia's file system approximately 45min after the hour ends. But for an end-user, it is available several hours later after that (this is shown on the screenshot). Could you help us by answering the following questions: 1. Is there any way to get data as soon as it is available on the Wikimedia filesystem (~45 min after the hour ends)? 2. Are there any other faster ways to get hourly data? For instance, faster access to raw data files or access to *wmf.pageview_hourly <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_ho…>* or to *wmf.pageviews_actor <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_ac…>*. Unfortunately, API does not provide the opportunity to get data on an hourly level. Best regards, Maxim Aparovich [image: wiki-email.png]

2 2

Wikimedia Research Showcase May 18
by Emily Lescak 11 May '22

11 May '22

Hello everyone, The next Research Showcase, *Gaps and Biases in Wikipedia*, will be live-streamed Wednesday, May 18, at 9:30 AM PST/16:30 UTC. View your local time here <https://zonestamp.toolforge.org/1652891400>. YouTube stream: https://www.youtube.com/watch?v=Q8FlunZ0mH4 You are welcome to ask questions via YouTube chat or on IRC at #wikimedia-research. This month's presentations: Ms. Categorized: Gender, notability, and inequality on Wikipedia By Francesca Tripodi (University of North Carolina at Chapel Hill) For the last five decades, sociologists have argued that gender is one of the most pervasive and insidious forms of inequality. Research demonstrates how these inequalities persist on Wikipedia - arguably the largest encyclopedic reference in existence. Roughly eighty percent of Wikipedia's editors are men and pages about women and women's interests are underrepresented. English language Wikipedia contains more than 1.5 million biographies about notable writers, inventors, and academics, but less than nineteen percent of these biographies are about women. To try and improve these statistics, activists host “edit-a-thons” to increase the visibility of notable women. While this strategy helps create several biographies previously inexistent, it fails to address a more inconspicuous form of gender exclusion. Drawing on ethnographic observations, interviews, and quantitative analysis of web-scraped metadata this talk demonstrates that women’s biographies are more frequently considered non-notable and nominated for deletion compared to men’s biographies. This disproportionate rate is another dimension of gender inequality on Wikipedia previously unexplored by social scientists and provides broader insights into how women’s achievements are (under)valued in society. Controlled Analyses of Social Biases in Wikipedia Bios By Yulia Tsvetkov (University of Washington) Social biases on Wikipedia could greatly influence public opinion. Wikipedia is also a popular source of training data for NLP models, and subtle biases in Wikipedia narratives are liable to be amplified in downstream NLP models. In this talk I'll present two approaches to unveiling social biases in how people are described on Wikipedia, across demographic attributes and across languages. First, I'll present a methodology that isolates dimensions of interest (e.g., gender), from other attributes (e.g., occupation). This methodology allows us to quantify systemic differences in coverage of different genders and races, while controlling for confounding factors. Next, I'll show an NLP case study that uses this methodology in combination with people-centric sentiment analysis to identify disparities in Wikipedia bios of members of the LGBTQIA+ community across three languages: English, Russian, and Spanish. Our results surface cultural differences in narratives and signs of social biases. Practically, these methods can be used to automatically identify Wikipedia articles for further manual analysis—articles that might contain content gaps or an imbalanced representation of particular social groups. You can also watch our past research showcases here: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase Emily, on behalf of the Research team -- Emily Lescak (she / her) Senior Research Community Officer The Wikimedia Foundation

1 0

Wikimedia AQS Pageviews API Question
by Ben Smith 04 May '22

04 May '22

Hello all, We use the Wikimedia AQS Pageviews REST API: [Analytics/AQS/Pageviews - Wikitech](https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews). When making requests for pageviews counts by article, we have noticed that not all data for all pages will exist for the latest day at the same time. Some pages appear to be updated later than others. Is there a place we can check (i.e. a status page or dump files) to determine whether all pageview data is accessible for the latest day via the AQS Pageviews REST API? Best, Ben

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics May 2022