Announcing the Wikipedia Image / Caption Matching Competition - Wikimedia-l

8 Oct 2021


      Hi all,
To help bridge Wikipedia’s visual knowledge gaps, the Research team
https://research.wikimedia.org/ at the Wikimedia Foundation has launched
the “Wikipedia Image/Caption Matching Competition
https://www.kaggle.com/c/wikipedia-image-caption”.
Read on for more information or check out our blog post
https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-challenge-and-a-huge-release-of-image-data-for-research/
!
Images are essential for knowledge sharing, learning, and understanding.
However, the majority of images on Wikipedia articles lack written context
(e.g., captions, alt-text), often making them inaccessible. As part of our
initiatives https://research.wikimedia.org/knowledge-gaps.html to address
Wikipedia’s knowledge gaps, the Research https://research.wikimedia.org/
team at the Wikimedia Foundation is hosting the “Wikipedia Image/Caption
Matching Competition https://www.kaggle.com/c/wikipedia-image-caption.”
We invite the communities of volunteers, developers, data scientists, and
machine learning enthusiasts to develop systems that can automatically
associate images with their corresponding captions and article titles.
In this competition (hosted on Kaggle https://www.kaggle.com/),
participants are provided with content from Wikipedia articles in 100+
language editions and are asked to build systems that automatically
retrieve the text (an image caption, or an article title) closest to a
query image.The data is a combination of Google AI’s recently released WIT
dataset https://github.com/google-research-datasets/wit and a new dataset
of 6 Million images from Wikimedia Commons that we have released
https://analytics.wikimedia.org/published/datasets/one-off/caption_competition/
for this competition. Kaggle is hosting all data needed to get started with
the task, example notebooks, a forum for participants to share and
collaborate, and submitted models in open-sourced formats.
We encourage everyone to download our data and participate in the
competition. This challenge is an opportunity for people around the world
to grow their technical skills while increasing the accessibility of
Wikipedia.
This competition is possible thanks to collaborations with Google Research
https://research.google/,  EPFL https://www.epfl.ch/en/, Naver Labs
Europe https://europe.naverlabs.com/ and Hugging Face
https://huggingface.co/, who assisted with data preparation and
competition design. Check out our blog post
https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-challenge-and-a-huge-release-of-image-data-for-research/
for more information! The point of contact for this project is Miriam Redi.
You're welcome to reach out with questions or comments at
miriam@wikimedia.org.
Cheers,
Emily Lescak, on behalf of the Research team
-- 
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation