Hi all,

The next Research Showcase will be live-streamed on Wednesday, May 20, at 9:30 AM PDT/16:30 UTC.

This month we will learn about recent research on machine learning systems that rely on human supervision for their learning and optimization -- a research area commonly referred to as Human-in-the-Loop ML. In the first talk, Jie Yang will present a computational framework that relies on crowdsourcing to identify influencers in Social Networks (Twitter) by selectively obtaining labeled data. In the second talk, Estelle Smith will discuss the role of the community in maintaining ORES, the machine learning system that predicts the quality in Wikipedia applications.

YouTube stream: https://www.youtube.com/watch?v=8nDiu2ebdOI

As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases here: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase

This month's presentations:

OpenCrowd: A Human-AI Collaborative Approach for Finding Social Influencers via Open-Ended Answers Aggregation

By: Jie Yang, Amazon (current), Delft University of Technology (starting soon)

Finding social influencers is a fundamental task in many online applications ranging from brand marketing to opinion mining. Existing methods heavily rely on the availability of expert labels, whose collection is usually a laborious process even for domain experts. Using open-ended questions, crowdsourcing provides a cost-effective way to find a large number of social influencers in a short time. Individual crowd workers, however, only possess fragmented knowledge that is often of low quality. To tackle those issues, we present OpenCrowd, a unified Bayesian framework that seamlessly incorporates machine learning and crowdsourcing for effectively finding social influencers. To infer a set of influencers, OpenCrowd bootstraps the learning process using a small number of expert labels and then jointly learns a feature-based answer quality model and the reliability of the workers. Model parameters and worker reliability are updated iteratively, allowing their learning processes to benefit from each other until an agreement on the quality of the answers is reached. We derive a principled optimization algorithm based on variational inference with efficient updating rules for learning OpenCrowd parameters. Experimental results on finding social influencers in different domains show that our approach substantially improves the state of the art by 11.5% AUC. Moreover, we empirically show that our approach is particularly useful in finding micro-influencers, who are very directly engaged with smaller audiences.

Paper: https://dl.acm.org/doi/fullHtml/10.1145/3366423.3380254

Keeping Community in the Machine-Learning Loop

By: C. Estelle Smith, MS, PhD Candidate, GroupLens Research Lab at the University of Minnesota

On Wikipedia, sophisticated algorithmic tools are used to assess the quality of edits and take corrective actions. However, algorithms can fail to solve the problems they were designed for if they conflict with the values of communities who use them. In this study, we take a Value-Sensitive Algorithm Design approach to understanding a community-created and -maintained machine learning-based algorithm called the Objective Revision Evaluation System (ORES)—a quality prediction system used in numerous Wikipedia applications and contexts. Five major values converged across stakeholder groups that ORES (and its dependent applications) should: (1) reduce the effort of community maintenance, (2) maintain human judgement as the final authority, (3) support differing peoples’ differing workflows, (4) encourage positive engagement with diverse editor groups, and (5) establish trustworthiness of people and algorithms within the community. We reveal tensions between these values and discuss implications for future research to improve algorithms like ORES.

Paper: https://commons.wikimedia.org/wiki/File:Keeping_Community_in_the_Loop-_Understanding_Wikipedia_Stakeholder_Values_for_Machine_Learning-Based_Systems.pdf

Janna Layton (she, her)

Administrative Assistant - Product & Technology