Hello everyone,

The next Research Showcase, Gaps and Biases in Wikipedia, will be live-streamed Wednesday, May 18, at 9:30 AM PST/16:30 UTC. View your local time here

YouTube stream: https://www.youtube.com/watch?v=Q8FlunZ0mH4 

You are welcome to ask questions via YouTube chat or on IRC at #wikimedia-research. 

This month's presentations: 

Ms. Categorized: Gender, notability, and inequality on Wikipedia

By Francesca Tripodi (University of North Carolina at Chapel Hill)

For the last five decades, sociologists have argued that gender is one of the most pervasive and insidious forms of inequality. Research demonstrates how these inequalities persist on Wikipedia - arguably the largest encyclopedic reference in existence. Roughly eighty percent of Wikipedia's editors are men and pages about women and women's interests are underrepresented. English language Wikipedia contains more than 1.5 million biographies about notable writers, inventors, and academics, but less than nineteen percent of these biographies are about women. To try and improve these statistics, activists host “edit-a-thons” to increase the visibility of notable women. While this strategy helps create several biographies previously inexistent, it fails to address a more inconspicuous form of gender exclusion. Drawing on ethnographic observations, interviews, and quantitative analysis of web-scraped metadata this talk demonstrates that women’s biographies are more frequently considered non-notable and nominated for deletion compared to men’s biographies. This disproportionate rate is another dimension of gender inequality on Wikipedia previously unexplored by social scientists and provides broader insights into how women’s achievements are (under)valued in society.

Controlled Analyses of Social Biases in Wikipedia Bios

By Yulia Tsvetkov (University of Washington)

Social biases on Wikipedia could greatly influence public opinion. Wikipedia is also a popular source of training data for NLP models, and subtle biases in Wikipedia narratives are liable to be amplified in downstream NLP models. In this talk I'll present two approaches to unveiling social biases in how people are described on Wikipedia, across demographic attributes and across languages. First, I'll present a methodology that isolates dimensions of interest (e.g., gender), from other attributes (e.g., occupation). This methodology allows us to quantify systemic differences in coverage of different genders and races, while controlling for confounding factors. Next, I'll show an NLP case study that uses this methodology in combination with people-centric sentiment analysis to identify disparities in Wikipedia bios of members of the LGBTQIA+ community across three languages: English, Russian, and Spanish. Our results surface cultural differences in narratives and signs of social biases. Practically, these methods can be used to automatically identify Wikipedia articles for further manual analysis—articles that might contain content gaps or an imbalanced representation of particular social groups.


You can also watch our past research showcases here: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase


Emily, on behalf of the Research team


--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation