Hi all,

To help bridge Wikipedia’s visual knowledge gaps, the Research team at the Wikimedia Foundation has launched the “Wikipedia Image/Caption Matching Competition”.

Read on for more information or check out our blog post!

Images are essential for knowledge sharing, learning, and understanding. However, the majority of images on Wikipedia articles lack written context (e.g., captions, alt-text), often making them inaccessible. As part of our initiatives to address Wikipedia’s knowledge gaps, the Research team at the Wikimedia Foundation is hosting the “Wikipedia Image/Caption Matching Competition.” We invite the communities of volunteers, developers, data scientists, and machine learning enthusiasts to develop systems that can automatically associate images with their corresponding captions and article titles.

In this competition (hosted on Kaggle), participants are provided with content from Wikipedia articles in 100+ language editions and are asked to build systems that automatically retrieve the text (an image caption, or an article title) closest to a query image.The data is a combination of Google AI’s recently released WIT dataset and a new dataset of 6 Million images from Wikimedia Commons that we have released for this competition. Kaggle is hosting all data needed to get started with the task, example notebooks, a forum for participants to share and collaborate, and submitted models in open-sourced formats.

We encourage everyone to download our data and participate in the competition. This challenge is an opportunity for people around the world to grow their technical skills while increasing the accessibility of Wikipedia.

This competition is possible thanks to collaborations with Google Research, EPFL, Naver Labs Europe and Hugging Face, who assisted with data preparation and competition design. Check out our blog post for more information! The point of contact for this project is Miriam Redi. You're welcome to reach out with questions or comments at miriam@wikimedia.org.

Cheers,

Emily Lescak, on behalf of the Research team

Emily Lescak (she / her)

Senior Research Community Officer

The Wikimedia Foundation