The Research Showcase and that coveted human interaction via a Q&A is starting in about 30 minutes! 

On Thu, Mar 12, 2020 at 12:29 PM Janna Layton <> wrote:

Hi all,

The next Research Showcase will be live-streamed on Wednesday, March 18, at 9:30 AM PDT/16:30 UTC. We’ll have a presentation on topic modeling by Jordan Boyd-Graber. A question-and-answer session will follow.

YouTube stream:

As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases here:

This month's presentation:

Big Data Analysis with Topic Models: Evaluation, Interaction, and Multilingual Extensions

By: Jordan Boyd-Graber, University of Maryland

A common information need is to understand large, unstructured datasets: millions of e-mails during e-discovery, a decade worth of science correspondence, or a day's tweets. In the last decade, topic models have become a common tool for navigating such datasets even across languages. This talk investigates the foundational research that allows successful tools for these data exploration tasks: how to know when you have an effective model of the dataset; how to correct bad models; how to measure topic model effectiveness; and how to detect framing and spin using these techniques. After introducing topic models, I argue why traditional measures of topic model quality---borrowed from machine learning---are inconsistent with how topic models are actually used. In response, I describe interactive topic modeling, a technique that enables users to impart their insights and preferences to models in a principled, interactive way. I will then address measuring topic model effectiveness in real-world tasks.

Overview of topic models:

Topic model evaluation:

Interactive topic modeling:

Topic Models for Categorization:

Janna Layton (she, her)
Administrative Assistant - Product & Technology 

Janna Layton (she, her)
Administrative Assistant - Product & Technology