Blondell R Sykes
On Wednesday, April 21, 2021, 08:01:36 AM EDT,
Send Wikidata mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikidata digest..."
1. We built two tools to help editors get a more complete
picture of the data quality on Wikidata (Mohammed Sadat Abdulai)
2. Re: We built two tools to help editors get a more complete
picture of the data quality on Wikidata (Hay (Husky))
Date: Tue, 20 Apr 2021 18:28:53 +0200
From: Mohammed Sadat Abdulai <mohammed.sadat(a)wikimedia.de>
To: Discussion list for the Wikidata project
Subject: [Wikidata] We built two tools to help editors get a more
complete picture of the data quality on Wikidata
Content-Type: text/plain; charset="utf-8"
This is to announce that over the past month we started to look at ways to
help us all get a better understanding of the quality of Wikidata's data in
a specific area of interest. For this purpose we worked on building two
tools; an Item Quality Evaluator and a Constraint Violation Checker - both
of these tools are now available at:
Item Quality Evaluator <https://item-quality-evaluator.toolforge.org>
Constraint Violation Checker
Data quality on Wikidata has many aspects. The constraint violations and
ORES quality scores that these tools use are two helpful indicators of
certain aspects of quality that we hope will be helpful for you.
As you may know, Wikidata’s data quality is very unevenly distributed -
some areas are very well maintained and others not so much. We only
currently provide ORES quality scores on a global and per-Item level. This
has two effects, however:
Editors taking care of a specific area of Wikidata want to improve that
area but currently don’t have an easy way to find the Items with the lowest
quality they can focus their time on in order to raise the quality of that
Re-user of Wikidata’s data are usually only interested in a subset of
Wikidata’s Items and by extension the quality of that subset. It is
currently hard for them to know what quality level they are getting for
their subset of interest.
To address this issue we put together two small tools. The Item Quality
Evaluator is a simple website that provides ORES quality scores for a list
of Items in Wikidata. The Constraint Violation Checker is a small
command-line script that retrieves the number of constraint violations and
ORES scores for a list of Items for further analysis.
How does the Item Quality Evaluator tool work?
You provide it with a list of Item IDs or a SPARQL query and then it'll get
the ORES score for each of them as well as the average score over all the
provided in a nice webpage. This way, you can more easily identify the
Items in an area you are interested in that have the lowest quality and
How does the Constraint Violation Checker script work?
When you run it, it outputs a CSV file with the number of statements, the
number of constraint violations for each severity level, the number of
sitelinks to all projects and to Wikipedia and the ORES score for each of
Why didn't we integrate the constraint violations data into the Item
We want to do that in the long-term but right now it is not possible
because the constraint violation data is not easily accessible and
retrieving it takes several hours to run for a large list of items.
Please try these tools and let us know if you encounter any issues. If you
want to provide general feedback, feel free to let us know.
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.