Hi all,
To help bridge Wikipedia’s visual knowledge gaps, the Research team https://research.wikimedia.org/ at the Wikimedia Foundation has launched the “Wikipedia Image/Caption Matching Competition https://www.kaggle.com/c/wikipedia-image-caption”.
Read on for more information or check out our blog post https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-challenge-and-a-huge-release-of-image-data-for-research/ !
Images are essential for knowledge sharing, learning, and understanding. However, the majority of images on Wikipedia articles lack written context (e.g., captions, alt-text), often making them inaccessible. As part of our initiatives https://research.wikimedia.org/knowledge-gaps.html to address Wikipedia’s knowledge gaps, the Research https://research.wikimedia.org/ team at the Wikimedia Foundation is hosting the “Wikipedia Image/Caption Matching Competition https://www.kaggle.com/c/wikipedia-image-caption.” We invite the communities of volunteers, developers, data scientists, and machine learning enthusiasts to develop systems that can automatically associate images with their corresponding captions and article titles.
In this competition (hosted on Kaggle https://www.kaggle.com/), participants are provided with content from Wikipedia articles in 100+ language editions and are asked to build systems that automatically retrieve the text (an image caption, or an article title) closest to a query image.The data is a combination of Google AI’s recently released WIT dataset https://github.com/google-research-datasets/wit and a new dataset of 6 Million images from Wikimedia Commons that we have released https://analytics.wikimedia.org/published/datasets/one-off/caption_competition/ for this competition. Kaggle is hosting all data needed to get started with the task, example notebooks, a forum for participants to share and collaborate, and submitted models in open-sourced formats.
We encourage everyone to download our data and participate in the competition. This challenge is an opportunity for people around the world to grow their technical skills while increasing the accessibility of Wikipedia.
This competition is possible thanks to collaborations with Google Research https://research.google/, EPFL https://www.epfl.ch/en/, Naver Labs Europe https://europe.naverlabs.com/ and Hugging Face https://huggingface.co/, who assisted with data preparation and competition design. Check out our blog post https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-challenge-and-a-huge-release-of-image-data-for-research/ for more information! The point of contact for this project is Miriam Redi. You're welcome to reach out with questions or comments at miriam@wikimedia.org.
Cheers,
Emily Lescak, on behalf of the Research team
wikimedia-l@lists.wikimedia.org