Hello everyone,
We are excited to announce an upcoming collaboration between the Wikidata development team and data science students from Purdue University. The goal of this collaboration is to identify and address discrepancies between Wikidata and external data sources, potentially providing new useful mismatches for the Mismatch Finder. More details about the project can be found here: Wikidata:Mismatch Finder/Purdue Summer of Data 2024 https://www.wikidata.org/wiki/Wikidata:Mismatch_Finder/Collaboration/Purdue_Summer_of_Data_2024
Project Overview
For those unfamiliar with the Mismatch Finder https://www.wikidata.org/wiki/Wikidata:Mismatch_Finder, it is a tool that identifies potential discrepancies between Wikidata items and external databases, which are then presented to editors for review and correction. This tool also suggests new statements that should be part of Wikidata, but need a human-review step before adding them. As part of this project, the students will work on providing mismatches for the Mismatch Finder and addressing the discrepancies while receiving guidance and support. All their work will be open source and released under open licenses.
Your Participation Matters
We seek your active participation in helping us identify data sources that, when compared with Wikidata, could generate significant mismatches. We are particularly interested in datasets that are free to use, easily accessible, and ideally helpful for data that is used on Wikipedia. We have collected potential data sources that we could work with in T304448 https://phabricator.wikimedia.org/T304448, and we would like you to add more to the list.
Beyond dataset suggestions, we would also appreciate your feedback on what exactly we should match, as these datasets are often extensive and broad.
If you have any questions, concerns or feedback, please leave us a note on the project’s talk page: Wikidata talk:Mismatch Finder/Purdue Summer of Data 2024 https://www.wikidata.org/wiki/Wikidata_talk:Mismatch_Finder/Collaboration/Purdue_Summer_of_Data_2024
Cheers,