Hello everyone,
This is to announce that we have finalized the concepts and started development for a tool to help editors work on mismatches between Wikidata's data and other databases/websites.
Why are we doing this?
Wikidata is becoming too big for the editors to monitor individual data points. Additionally, keeping Wikidata’s data in sync with the external database (meant in the broadest way possible) requires a lot of effort and existing workflows are haphazard and one-off.
Who are we doing this for?
The target audience of this tool are tech-savvy editors whose primary goal is finding mismatches and data quality improvement. They are:
Dedicated data quality workers who specifically seeks out lists of issues in an area of their interest to fix a number of mistakes
Heavily affected by the data quality
Experienced using bots and mass editing gadgets
What is the solution?
We will build a system that will have a store for mismatches. Different people and organizations can load mismatches they found into this system. Various tools can then get mismatches from the system to help editors resolve them.
Sources for the mismatches can be many. We will start with mismatches that we found as part of previous work to find references for statements lacking references (aka Reference Treasure Hunt). In the future, categories on Wikipedia that indicate a mismatch between the local value on that Wikipedia and the corresponding value on Wikidata could also be possible. Various research organizations as well as large data re-users could also contribute mismatches they found in their internal processes when doing quality assurance on Wikidata’s data.
We hope that this tool will help to make it easier for editors to find and fix the mismatches between Wikidata’s data and other databases.
Feel free to ask questions or give us feedback at the discussion page: Wikidata_talk:Mismatch_Finder
Cheers,